Research Article
BibTex RIS Cite
Year 2024, Volume: 1 Issue: 2, 63 - 70, 20.12.2024

Abstract

References

  • Li, K., Wang, F., Yang, L., & Liu, R. (2023). Deep feature screening: Feature selection for ultra high-dimensional data via deep neural networks. Neurocomputing, 538, 126186.
  • Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification, Hoboken. In: NJ: Wiley.
  • Abiodun, E. O., Alabdulatif, A., Abiodun, O. I., Alawida, M., Alabdulatif, A., & Alkhawaldeh, R. S. (2021). A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities. Neural Computing and Applications, 33(22), 15091-15118.
  • Gan, M., & Zhang, L. (2021). Iteratively local fisher score forfeature selection. Applied Intelligence, 51, 6167-6181. He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. Advances in neural information processing systems, 18.
  • Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
  • Khan, Z., Ali, A., & Aldahmani, S. (2024). Feature Selection via Robust Weighted Score for High Dimensional Binary ClassImbalanced Gene Expression Data. arXiv preprint arXiv:2401.12667.
  • Jagdhuber, R., Lang, M., Stenzl, A., Neuhaus, J., & Rahnenführer, J. (2020). Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms. BMC bioinformatics, 21, 1-21.
  • Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, maxrelevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238.
  • Datasets: Feature selection . (n.d.). Retrieved from https://jundongl.github.io/scikit-feature/datasets.html
  • Davide Nardone. (2019). Biological datasets for SMBA. https://doi.org/10.5281/zenodo.2709491
  • GEO Accession viewer. (n.d.). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4412
  • Freije, W. A., Castro-Vargas, F. E., Fang, Z., Horvath, S., Cloughesy, T., Liau, L. M., Mischel, P. S., & Nelson, S. F. (2004). Gene expression profiling of gliomas strongly predicts survival. Cancer research, 64(18), 6503–6510. https://doi.org/10.1158/0008-5472.CAN-04-0452
  • Spira, A., Beane, J. E., Shah, V., Steiling, K., Liu, G., Schembri, F., Gilman, S., Dumas, Y. M., Calner, P., Sebastiani, P., Sridhar, S., Beamis, J., Lamb, C., Anderson, T., Gerry, N., Keane, J., Lenburg, M. E., & Brody, J. S. (2007). Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nature medicine, 13(3), 361–366. https://doi.org/10.1038/nm1556
  • Gustafson, A. M., Soldi, R., Anderlind, C., Scholand, M. B., Qian, J., Zhang, X., Cooper, K., Walker, D., McWilliams, A., Liu, G., Szabo, E., Brody, J., Massion, P. P., Lenburg, M. E., Lam, S., Bild, A. H., & Spira, A. (2010). Airway PI3K pathway activation is an early and reversible event in lung cancer development. Science translational medicine, 2(26), 26ra25. https://doi.org/10.1126/scitranslmed.3000251
  • Wayback machine. (n.d.). https://web.archive.org/web/20150221003104/ http://www.nipsfsc.ecs.soton.ac.uk/papers/NIPS2003-Datasets.pdf
  • Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.
  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63, 3-42.

Histogram-Based Feature Selection for Binary Classification

Year 2024, Volume: 1 Issue: 2, 63 - 70, 20.12.2024

Abstract

This paper presents a novel method for feature selection in binary classification tasks based on histogram-based scoring. By leveraging the distribution differences between feature values associated with positive and negative classes, we generate a score to determine the most informative features. The method, called Histogram-Based Feature Selection (HBFS), has been tested against a variety of datasets and compared to the Fisher Score for performance assessment. Our findings indicate that HBFS either matches or outperforms Fisher Score in most datasets.

References

  • Li, K., Wang, F., Yang, L., & Liu, R. (2023). Deep feature screening: Feature selection for ultra high-dimensional data via deep neural networks. Neurocomputing, 538, 126186.
  • Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification, Hoboken. In: NJ: Wiley.
  • Abiodun, E. O., Alabdulatif, A., Abiodun, O. I., Alawida, M., Alabdulatif, A., & Alkhawaldeh, R. S. (2021). A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities. Neural Computing and Applications, 33(22), 15091-15118.
  • Gan, M., & Zhang, L. (2021). Iteratively local fisher score forfeature selection. Applied Intelligence, 51, 6167-6181. He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. Advances in neural information processing systems, 18.
  • Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
  • Khan, Z., Ali, A., & Aldahmani, S. (2024). Feature Selection via Robust Weighted Score for High Dimensional Binary ClassImbalanced Gene Expression Data. arXiv preprint arXiv:2401.12667.
  • Jagdhuber, R., Lang, M., Stenzl, A., Neuhaus, J., & Rahnenführer, J. (2020). Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms. BMC bioinformatics, 21, 1-21.
  • Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, maxrelevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238.
  • Datasets: Feature selection . (n.d.). Retrieved from https://jundongl.github.io/scikit-feature/datasets.html
  • Davide Nardone. (2019). Biological datasets for SMBA. https://doi.org/10.5281/zenodo.2709491
  • GEO Accession viewer. (n.d.). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4412
  • Freije, W. A., Castro-Vargas, F. E., Fang, Z., Horvath, S., Cloughesy, T., Liau, L. M., Mischel, P. S., & Nelson, S. F. (2004). Gene expression profiling of gliomas strongly predicts survival. Cancer research, 64(18), 6503–6510. https://doi.org/10.1158/0008-5472.CAN-04-0452
  • Spira, A., Beane, J. E., Shah, V., Steiling, K., Liu, G., Schembri, F., Gilman, S., Dumas, Y. M., Calner, P., Sebastiani, P., Sridhar, S., Beamis, J., Lamb, C., Anderson, T., Gerry, N., Keane, J., Lenburg, M. E., & Brody, J. S. (2007). Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nature medicine, 13(3), 361–366. https://doi.org/10.1038/nm1556
  • Gustafson, A. M., Soldi, R., Anderlind, C., Scholand, M. B., Qian, J., Zhang, X., Cooper, K., Walker, D., McWilliams, A., Liu, G., Szabo, E., Brody, J., Massion, P. P., Lenburg, M. E., Lam, S., Bild, A. H., & Spira, A. (2010). Airway PI3K pathway activation is an early and reversible event in lung cancer development. Science translational medicine, 2(26), 26ra25. https://doi.org/10.1126/scitranslmed.3000251
  • Wayback machine. (n.d.). https://web.archive.org/web/20150221003104/ http://www.nipsfsc.ecs.soton.ac.uk/papers/NIPS2003-Datasets.pdf
  • Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.
  • Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63, 3-42.
There are 17 citations in total.

Details

Primary Language English
Subjects Machine Learning Algorithms, Classification Algorithms
Journal Section Research Article
Authors

Selman Delil 0000-0001-8149-3561

Melih Ağraz

Birol Kuyumcu

Publication Date December 20, 2024
Submission Date November 28, 2024
Acceptance Date December 12, 2024
Published in Issue Year 2024 Volume: 1 Issue: 2

Cite

APA Delil, S., Ağraz, M., & Kuyumcu, B. (2024). Histogram-Based Feature Selection for Binary Classification. Transactions on Computer Science and Applications, 1(2), 63-70.