Histogram-Based Feature Selection for Binary Classification

Selman Delil; Melih Ağraz; Birol Kuyumcu

Research Article

Year 2024, Volume: 1 Issue: 2, 63 - 70, 20.12.2024

Selman Delil , Melih Ağraz , Birol Kuyumcu

Abstract

References

Li, K., Wang, F., Yang, L., & Liu, R. (2023). Deep feature screening: Feature selection for ultra high-dimensional data via deep neural networks. Neurocomputing, 538, 126186.
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification, Hoboken. In: NJ: Wiley.
Abiodun, E. O., Alabdulatif, A., Abiodun, O. I., Alawida, M., Alabdulatif, A., & Alkhawaldeh, R. S. (2021). A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities. Neural Computing and Applications, 33(22), 15091-15118.
Gan, M., & Zhang, L. (2021). Iteratively local fisher score forfeature selection. Applied Intelligence, 51, 6167-6181. He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. Advances in neural information processing systems, 18.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
Khan, Z., Ali, A., & Aldahmani, S. (2024). Feature Selection via Robust Weighted Score for High Dimensional Binary ClassImbalanced Gene Expression Data. arXiv preprint arXiv:2401.12667.
Jagdhuber, R., Lang, M., Stenzl, A., Neuhaus, J., & Rahnenführer, J. (2020). Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms. BMC bioinformatics, 21, 1-21.
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, maxrelevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238.
Datasets: Feature selection . (n.d.). Retrieved from https://jundongl.github.io/scikit-feature/datasets.html
Davide Nardone. (2019). Biological datasets for SMBA. https://doi.org/10.5281/zenodo.2709491
GEO Accession viewer. (n.d.). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4412
Freije, W. A., Castro-Vargas, F. E., Fang, Z., Horvath, S., Cloughesy, T., Liau, L. M., Mischel, P. S., & Nelson, S. F. (2004). Gene expression profiling of gliomas strongly predicts survival. Cancer research, 64(18), 6503–6510. https://doi.org/10.1158/0008-5472.CAN-04-0452
Spira, A., Beane, J. E., Shah, V., Steiling, K., Liu, G., Schembri, F., Gilman, S., Dumas, Y. M., Calner, P., Sebastiani, P., Sridhar, S., Beamis, J., Lamb, C., Anderson, T., Gerry, N., Keane, J., Lenburg, M. E., & Brody, J. S. (2007). Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nature medicine, 13(3), 361–366. https://doi.org/10.1038/nm1556
Gustafson, A. M., Soldi, R., Anderlind, C., Scholand, M. B., Qian, J., Zhang, X., Cooper, K., Walker, D., McWilliams, A., Liu, G., Szabo, E., Brody, J., Massion, P. P., Lenburg, M. E., Lam, S., Bild, A. H., & Spira, A. (2010). Airway PI3K pathway activation is an early and reversible event in lung cancer development. Science translational medicine, 2(26), 26ra25. https://doi.org/10.1126/scitranslmed.3000251
Wayback machine. (n.d.). https://web.archive.org/web/20150221003104/ http://www.nipsfsc.ecs.soton.ac.uk/papers/NIPS2003-Datasets.pdf
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63, 3-42.

Histogram-Based Feature Selection for Binary Classification

Year 2024, Volume: 1 Issue: 2, 63 - 70, 20.12.2024

Selman Delil , Melih Ağraz , Birol Kuyumcu

Abstract

This paper presents a novel method for feature selection in binary classification tasks based on histogram-based scoring. By leveraging the distribution differences between feature values associated with positive and negative classes, we generate a score to determine the most informative features. The method, called Histogram-Based Feature Selection (HBFS), has been tested against a variety of datasets and compared to the Fisher Score for performance assessment. Our findings indicate that HBFS either matches or outperforms Fisher Score in most datasets.

Keywords

Machine Learning, Feature Selection, Histogram-Based Feature Selection, Fisher Score

References

Li, K., Wang, F., Yang, L., & Liu, R. (2023). Deep feature screening: Feature selection for ultra high-dimensional data via deep neural networks. Neurocomputing, 538, 126186.
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification, Hoboken. In: NJ: Wiley.
Abiodun, E. O., Alabdulatif, A., Abiodun, O. I., Alawida, M., Alabdulatif, A., & Alkhawaldeh, R. S. (2021). A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities. Neural Computing and Applications, 33(22), 15091-15118.
Gan, M., & Zhang, L. (2021). Iteratively local fisher score forfeature selection. Applied Intelligence, 51, 6167-6181. He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. Advances in neural information processing systems, 18.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
Khan, Z., Ali, A., & Aldahmani, S. (2024). Feature Selection via Robust Weighted Score for High Dimensional Binary ClassImbalanced Gene Expression Data. arXiv preprint arXiv:2401.12667.
Jagdhuber, R., Lang, M., Stenzl, A., Neuhaus, J., & Rahnenführer, J. (2020). Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms. BMC bioinformatics, 21, 1-21.
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, maxrelevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238.
Datasets: Feature selection . (n.d.). Retrieved from https://jundongl.github.io/scikit-feature/datasets.html
Davide Nardone. (2019). Biological datasets for SMBA. https://doi.org/10.5281/zenodo.2709491
GEO Accession viewer. (n.d.). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4412
Freije, W. A., Castro-Vargas, F. E., Fang, Z., Horvath, S., Cloughesy, T., Liau, L. M., Mischel, P. S., & Nelson, S. F. (2004). Gene expression profiling of gliomas strongly predicts survival. Cancer research, 64(18), 6503–6510. https://doi.org/10.1158/0008-5472.CAN-04-0452
Spira, A., Beane, J. E., Shah, V., Steiling, K., Liu, G., Schembri, F., Gilman, S., Dumas, Y. M., Calner, P., Sebastiani, P., Sridhar, S., Beamis, J., Lamb, C., Anderson, T., Gerry, N., Keane, J., Lenburg, M. E., & Brody, J. S. (2007). Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nature medicine, 13(3), 361–366. https://doi.org/10.1038/nm1556
Gustafson, A. M., Soldi, R., Anderlind, C., Scholand, M. B., Qian, J., Zhang, X., Cooper, K., Walker, D., McWilliams, A., Liu, G., Szabo, E., Brody, J., Massion, P. P., Lenburg, M. E., Lam, S., Bild, A. H., & Spira, A. (2010). Airway PI3K pathway activation is an early and reversible event in lung cancer development. Science translational medicine, 2(26), 26ra25. https://doi.org/10.1126/scitranslmed.3000251
Wayback machine. (n.d.). https://web.archive.org/web/20150221003104/ http://www.nipsfsc.ecs.soton.ac.uk/papers/NIPS2003-Datasets.pdf
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63, 3-42.

There are 17 citations in total.

Details

Primary Language	English
Subjects	Machine Learning Algorithms, Classification Algorithms
Journal Section	Research Article
Authors	Selman Delil 0000-0001-8149-3561 Melih Ağraz Birol Kuyumcu
Publication Date	December 20, 2024
Submission Date	November 28, 2024
Acceptance Date	December 12, 2024
Published in Issue	Year 2024 Volume: 1 Issue: 2

Cite

APA	Delil, S., Ağraz, M., & Kuyumcu, B. (2024). Histogram-Based Feature Selection for Binary Classification. Transactions on Computer Science and Applications, 1(2), 63-70.

Download Cover Image

Article Files

Full Text