Research Article
BibTex RIS Cite

Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS

Year 2024, , 693 - 704, 15.07.2024
https://doi.org/10.34248/bsengineering.1492652

Abstract

Cancer remains a global health challenge, with various types such as lung, breast, and colon cancer posing significant threats. Timely and accurate diagnosis is crucial for effective treatment and improved survival rates. Genetic research offers promising avenues in the fight against cancer, as identifying gene mutations and expression levels enables the development of targeted therapies and a deeper understanding of disease subtypes and progression. This study investigates a novel hybrid method aimed at improving the accuracy and efficiency of cancer diagnosis and classification. By combining Discrete Cosine Transformation (DCT) and Univariate Feature Selection (UFS) methods, the feature selection process is optimized for the dataset. The extracted features are then rigorously tested using established classifiers to assess their effectiveness in cancer classification. The proposed method's performance was evaluated using eight distinct datasets, and metrics such as MF1, K-score, and sensitivity were calculated and compared with various methods in the literature. Empirical evidence demonstrates that the proposed method outperforms others on 5 out of 8 datasets in terms of both accuracy and computational efficiency. The presented method represents a reliable tool for cancer diagnosis and classification.

References

  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc National Acad Sci, 96(12): 6745-6750.
  • Alrefai N, Ibrahim O. 2022. Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets. Neural Comput Appl, 34(16): 13513-13528.
  • Baliarsingh SK, Vipsita S, Muhammad K, Bakshi S. 2019. Analysis of high-dimensional biomedical data using an evolutionary multi-objective emperor penguin optimizer. Swarm Evol Comput, 48: 262-273.
  • Efe E, Özşen S. 2022. Comparison of time-frequency analyzes for a sleep staging application with CNN. J Biomimetics, Biomater Biomedic Eng, 55: 109-130.
  • Efe E, Ozsen S. 2023. CoSleepNet: Automated sleep staging using a hybrid CNN-LSTM network on imbalanced EEG-EOG datasets. Biomed Signal Proces Control, 80: 104299.
  • Efe E, Yavsan E. 2024. AttBiLFNet: A novel hybrid network for accurate and efficient arrhythmia detection in imbalanced ECG signals. Math Biosci Eng, 21(4): 5863-5880.
  • Er MJ, Chen W, Wu S. 2005. High-speed face recognition based on discrete cosine transform and RBF neural networks. IEEE Transact Neural Networks, 16(3): 679-691.
  • Gao L, Ye M, Lu X, Huang D. 2017. Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genomics Proteomics Bioinformatics, 15(6): 389-395.
  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, … Caligiuri MA. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286(5439): 531-537.
  • Gunavathi C, Premalatha K. 2014. Performance analysis of genetic algorithm with kNN and SVM for feature selection in tumor classification. Int J Comput Info Eng, 8(8): 1490-1497.
  • Guyon I, Weston J, Barnhill S, Vapnik V. 2002. Gene selection for cancer classification using support vector machines. Machine Learn, 46: 389-422.
  • Kar S, Sharma K Das, Maitra M. 2015. Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Syst Appl, 42(1): 612-627.
  • Kilicarslan S, Adem K, Celik M. 2020. Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network. Medic Hypot, 137: 109577.
  • Kumar M, Rath SK. 2015. Classification of microarray using MapReduce based proximal support vector machine classifier. Knowledge-Based Syst, 89: 584-602.
  • Li L, Jiang W, Li X, Moser KL, Guo Z, Du L, Rao S. 2005. A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. Genomics, 85(1): 16-23.
  • Luo K, Wang G, Li Q, Tao J. 2019. An improved SVM-RFE based on $ F $-statistic and mPDC for gene selection in cancer classification. IEEE Access, 7: 147617-147628.
  • Maldonado S, Weber R, Basak J. 2011. Simultaneous feature selection and classification using kernel-penalized support vector machines. Info Sci, 181(1): 115-128.
  • Medjahed SA, Saadi TA, Benyettou A, Ouali M. 2017. Kernel-based learning and feature selection analysis for cancer diagnosis. Appl Soft Comput, 51: 39-48.
  • Meenachi L, Ramakrishnan S. 2021. Metaheuristic search based feature selection methods for classification of cancer. Pattern Recog, 119: 108079.
  • Mundra PA, Rajapakse JC. 2009. SVM-RFE with MRMR filter for gene selection. IEEE Transact Nanobiosci, 9(1): 31-37.
  • Naderi A, Teschendorff AE, Barbosa-Morais NL, Pinder SE, Green AR, Powe DG, Brenton JD. 2007. A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene, 26(10): 1507-1516.
  • Orhan H, Yavşan E. 2023. Artificial intelligence-assisted detection model for melanoma diagnosis using deep learning techniques. Math Mod Numeric Sim Appl, 3(2): 159-169.
  • Othman MS, Kumaran SR, Yusuf LM. 2020. Gene selection using hybrid multi-objective cuckoo search algorithm with evolutionary operators for cancer microarray data. IEEE Access, 8: 186348-186361.
  • Panda M. 2020. Elephant search optimization combined with deep neural network for microarray data analysis. J King Saud Univ Comput Info Sci, 32(8): 940-948.
  • Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Kohn EC. 2002. Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359(9306): 572-577.
  • Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Lau C. 2002. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870): 436-442.
  • Pragadeesh C, Jeyaraj R, Siranjeevi K, Abishek R, Jeyakumar G. 2019. Hybrid feature selection using micro genetic algorithm on microarray gene expression data. J Intel Fuzzy Syst, 36(3): 2241-2246.
  • Qaraad M, Amjad S, Manhrawy IIM, Fathi H, Hassan BA, El Kafrawy P. 2021. A hybrid feature selection optimization model for high dimension data classification. IEEE Access, 9: 42884-42895.
  • Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, Pinkus GS. 2002. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medic, 8(1): 68-74.
  • Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Richie JP. 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2): 203-209.
  • Sönmez ÖS, Dağtekin M, Ensari T. 2021. Gene expression data classification using genetic algorithm-basedfeature selection. Turkish J Elect Eng Comput Sci, 29(7): 3165-3179.
  • Sun L, Zhang X, Xu J, Wang W, Liu R. 2018. A gene selection approach based on the fisher linear discriminant and the neighborhood rough set. Bioengineered, 9(1): 144-151.
  • Van’t Veer LJ, Dai H, Van De Vijver MJ, He YD, Hart AAM, Mao M, Witteveen AT. 2002. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871): 530-536.
  • Zhang G, Hou J, Wang J, Yan C, Luo J. 2020. Feature selection for microarray data classification using hybrid information gain and a modified binary krill herd algorithm. Interdisciplinary Sci: Comput Life Sci, 12: 288-301.

Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS

Year 2024, , 693 - 704, 15.07.2024
https://doi.org/10.34248/bsengineering.1492652

Abstract

Cancer remains a global health challenge, with various types such as lung, breast, and colon cancer posing significant threats. Timely and accurate diagnosis is crucial for effective treatment and improved survival rates. Genetic research offers promising avenues in the fight against cancer, as identifying gene mutations and expression levels enables the development of targeted therapies and a deeper understanding of disease subtypes and progression. This study investigates a novel hybrid method aimed at improving the accuracy and efficiency of cancer diagnosis and classification. By combining Discrete Cosine Transformation (DCT) and Univariate Feature Selection (UFS) methods, the feature selection process is optimized for the dataset. The extracted features are then rigorously tested using established classifiers to assess their effectiveness in cancer classification. The proposed method's performance was evaluated using eight distinct datasets, and metrics such as MF1, K-score, and sensitivity were calculated and compared with various methods in the literature. Empirical evidence demonstrates that the proposed method outperforms others on 5 out of 8 datasets in terms of both accuracy and computational efficiency. The presented method represents a reliable tool for cancer diagnosis and classification.

References

  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc National Acad Sci, 96(12): 6745-6750.
  • Alrefai N, Ibrahim O. 2022. Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets. Neural Comput Appl, 34(16): 13513-13528.
  • Baliarsingh SK, Vipsita S, Muhammad K, Bakshi S. 2019. Analysis of high-dimensional biomedical data using an evolutionary multi-objective emperor penguin optimizer. Swarm Evol Comput, 48: 262-273.
  • Efe E, Özşen S. 2022. Comparison of time-frequency analyzes for a sleep staging application with CNN. J Biomimetics, Biomater Biomedic Eng, 55: 109-130.
  • Efe E, Ozsen S. 2023. CoSleepNet: Automated sleep staging using a hybrid CNN-LSTM network on imbalanced EEG-EOG datasets. Biomed Signal Proces Control, 80: 104299.
  • Efe E, Yavsan E. 2024. AttBiLFNet: A novel hybrid network for accurate and efficient arrhythmia detection in imbalanced ECG signals. Math Biosci Eng, 21(4): 5863-5880.
  • Er MJ, Chen W, Wu S. 2005. High-speed face recognition based on discrete cosine transform and RBF neural networks. IEEE Transact Neural Networks, 16(3): 679-691.
  • Gao L, Ye M, Lu X, Huang D. 2017. Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genomics Proteomics Bioinformatics, 15(6): 389-395.
  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, … Caligiuri MA. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286(5439): 531-537.
  • Gunavathi C, Premalatha K. 2014. Performance analysis of genetic algorithm with kNN and SVM for feature selection in tumor classification. Int J Comput Info Eng, 8(8): 1490-1497.
  • Guyon I, Weston J, Barnhill S, Vapnik V. 2002. Gene selection for cancer classification using support vector machines. Machine Learn, 46: 389-422.
  • Kar S, Sharma K Das, Maitra M. 2015. Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Syst Appl, 42(1): 612-627.
  • Kilicarslan S, Adem K, Celik M. 2020. Diagnosis and classification of cancer using hybrid model based on ReliefF and convolutional neural network. Medic Hypot, 137: 109577.
  • Kumar M, Rath SK. 2015. Classification of microarray using MapReduce based proximal support vector machine classifier. Knowledge-Based Syst, 89: 584-602.
  • Li L, Jiang W, Li X, Moser KL, Guo Z, Du L, Rao S. 2005. A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. Genomics, 85(1): 16-23.
  • Luo K, Wang G, Li Q, Tao J. 2019. An improved SVM-RFE based on $ F $-statistic and mPDC for gene selection in cancer classification. IEEE Access, 7: 147617-147628.
  • Maldonado S, Weber R, Basak J. 2011. Simultaneous feature selection and classification using kernel-penalized support vector machines. Info Sci, 181(1): 115-128.
  • Medjahed SA, Saadi TA, Benyettou A, Ouali M. 2017. Kernel-based learning and feature selection analysis for cancer diagnosis. Appl Soft Comput, 51: 39-48.
  • Meenachi L, Ramakrishnan S. 2021. Metaheuristic search based feature selection methods for classification of cancer. Pattern Recog, 119: 108079.
  • Mundra PA, Rajapakse JC. 2009. SVM-RFE with MRMR filter for gene selection. IEEE Transact Nanobiosci, 9(1): 31-37.
  • Naderi A, Teschendorff AE, Barbosa-Morais NL, Pinder SE, Green AR, Powe DG, Brenton JD. 2007. A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene, 26(10): 1507-1516.
  • Orhan H, Yavşan E. 2023. Artificial intelligence-assisted detection model for melanoma diagnosis using deep learning techniques. Math Mod Numeric Sim Appl, 3(2): 159-169.
  • Othman MS, Kumaran SR, Yusuf LM. 2020. Gene selection using hybrid multi-objective cuckoo search algorithm with evolutionary operators for cancer microarray data. IEEE Access, 8: 186348-186361.
  • Panda M. 2020. Elephant search optimization combined with deep neural network for microarray data analysis. J King Saud Univ Comput Info Sci, 32(8): 940-948.
  • Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Kohn EC. 2002. Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359(9306): 572-577.
  • Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Lau C. 2002. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870): 436-442.
  • Pragadeesh C, Jeyaraj R, Siranjeevi K, Abishek R, Jeyakumar G. 2019. Hybrid feature selection using micro genetic algorithm on microarray gene expression data. J Intel Fuzzy Syst, 36(3): 2241-2246.
  • Qaraad M, Amjad S, Manhrawy IIM, Fathi H, Hassan BA, El Kafrawy P. 2021. A hybrid feature selection optimization model for high dimension data classification. IEEE Access, 9: 42884-42895.
  • Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RCT, Pinkus GS. 2002. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medic, 8(1): 68-74.
  • Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Richie JP. 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2): 203-209.
  • Sönmez ÖS, Dağtekin M, Ensari T. 2021. Gene expression data classification using genetic algorithm-basedfeature selection. Turkish J Elect Eng Comput Sci, 29(7): 3165-3179.
  • Sun L, Zhang X, Xu J, Wang W, Liu R. 2018. A gene selection approach based on the fisher linear discriminant and the neighborhood rough set. Bioengineered, 9(1): 144-151.
  • Van’t Veer LJ, Dai H, Van De Vijver MJ, He YD, Hart AAM, Mao M, Witteveen AT. 2002. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871): 530-536.
  • Zhang G, Hou J, Wang J, Yan C, Luo J. 2020. Feature selection for microarray data classification using hybrid information gain and a modified binary krill herd algorithm. Interdisciplinary Sci: Comput Life Sci, 12: 288-301.
There are 34 citations in total.

Details

Primary Language English
Subjects Biomedical Engineering (Other), Electrical Engineering (Other)
Journal Section Research Articles
Authors

Enes Efe 0000-0002-6136-6140

Publication Date July 15, 2024
Submission Date May 30, 2024
Acceptance Date July 1, 2024
Published in Issue Year 2024

Cite

APA Efe, E. (2024). Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS. Black Sea Journal of Engineering and Science, 7(4), 693-704. https://doi.org/10.34248/bsengineering.1492652
AMA Efe E. Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS. BSJ Eng. Sci. July 2024;7(4):693-704. doi:10.34248/bsengineering.1492652
Chicago Efe, Enes. “Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS”. Black Sea Journal of Engineering and Science 7, no. 4 (July 2024): 693-704. https://doi.org/10.34248/bsengineering.1492652.
EndNote Efe E (July 1, 2024) Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS. Black Sea Journal of Engineering and Science 7 4 693–704.
IEEE E. Efe, “Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS”, BSJ Eng. Sci., vol. 7, no. 4, pp. 693–704, 2024, doi: 10.34248/bsengineering.1492652.
ISNAD Efe, Enes. “Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS”. Black Sea Journal of Engineering and Science 7/4 (July 2024), 693-704. https://doi.org/10.34248/bsengineering.1492652.
JAMA Efe E. Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS. BSJ Eng. Sci. 2024;7:693–704.
MLA Efe, Enes. “Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS”. Black Sea Journal of Engineering and Science, vol. 7, no. 4, 2024, pp. 693-04, doi:10.34248/bsengineering.1492652.
Vancouver Efe E. Effective Cancer Diagnosis through High-Dimensional Microarray Data Analysis by Integrating DCT and UFS. BSJ Eng. Sci. 2024;7(4):693-704.

                                                24890