A FEATURE SELECTION APPLICATION FOR CLASSIFICATION: A BANKING APPLICATION
Yıl 2022,
, 480 - 498, 28.11.2022
Emrah Sezer
,
Özgür Çakır
Öz
Operational data is gradually increasing with the effect of technological developments. Due to the increase in the amount and diversity of data, many difficulties are faced in data analysis and the evaluation of its results. Since the data transferred to the analysis stage consists of both relevant and irrelevant variables, time and resources required for analysis increases. It is obvious that resources and time are always limited. The aim of this study is to increase the success of classification methods under the limitation of time and resources by applying various feature selection methods. Feature selection methods were used on a set of banking customer data in order to determine the appropriate subsets for classification. Then, a classification method was applied on these selected subsets of features. By comparing the classification results, the contribution of the feature selection methods to the classification success was measured.
Kaynakça
- Abraham, R., J. B. Simha & S. S. Iyengar (2009) Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining. International Journal of Computational Intelligence Research, 5, 116-129.
- Alhaj, T. A., M. M. Siraj, A. Zainal, H. T. Elshoush & F. Elhaj (2016) Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation. PloS one, 11, e0166017.
- Ali, S. I. & W. Shahzad. 2012. A feature subset selection method based on symmetric uncertainty and ant colony optimization. In Emerging Technologies (ICET), 2012 International Conference on, 1-6. IEEE.
- Blum, A. L. & P. Langley (1997) Selection of relevant features and examples in machine learning. Artificial intelligence, 97, 245-271.
- Das, S. 2001. Filters, wrappers and a boosting-based hybrid for feature selection. In ICML, 74-81.
- Dash, M. & H. Liu (1997) Feature selection for classification. Intelligent data analysis, 1, 131-156.
- Guyon, I. & A. Elisseeff (2003) An introduction to variable and feature selection. Journal of machine learning research, 3, 1157-1182.
- Hall, M. 2000. Correlation Based Feature Selection for Discrete and Numeric Class Machine Learning. In Proc. 17th Int'l. Conf. Machine Learning.
- Hall, M. A. 1999. Correlation-based feature selection for machine learning. The University of Waikato.
- Hall, M. A. & G. Holmes (2000) Benchmarking attribute selection techniques for data mining.
- Hall, M. A. & G. Holmes (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data engineering, 15, 1437-1447.
- Hall, M. A. & L. A. Smith (1997) Feature subset selection: a correlation based filter approach.
- Jungjit, S. 2016. New Multi-Label Correlation-Based Feature Selection Methods for Multi-Label Classification and Application in Bioinformatics. University of Kent.
- Kantardzic, M. 2011. Data mining: concepts, models, methods, and algorithms. John Wiley & Sons.
- Karegowda, A. G., A. Manjunath & M. Jayaram (2010) Comparative study of attribute selection using gain ratio and correlation based feature selection. International Journal of Information Technology and Knowledge Management, 2, 271-277.
- Kohavi, R. & G. H. John (1997) Wrappers for feature subset selection. Artificial intelligence, 97, 273-324.
- Ladha, L. & T. Deepa (2011) Feature selection methods and algorithms. International journal on computer science and engineering, 3, 1787-1797.
- Liu, H. & H. Motoda. 1998a. Feature extraction, construction and selection: A data mining perspective. Springer Science & Business Media.
- Liu, H. & H. Motoda. 1998b. Feature selection for knowledge discovery and data mining. Springer Science & Business Media.
- Pedrycz, W., G. Succi & A. Sillitti. 2016. Computational Intelligence and Quantitative Software Engineering. Springer.
- Piroonratana, T., W. Wongseree, T. Usavanarong, A. Assawamakin, C. Limwongse & N. Chaiyaratana. 2010. Identification of Ancestry Informative Markers from Chromosome-Wide Single Nucleotide Polymorphisms Using Symmetrical Uncertainty
Ranking. In Pattern Recognition (ICPR), 2010 20th International Conference on, 2448-2451. IEEE.
- Priyadarsini, R. P., M. Valarmathi & S. Sivakumari (2011) Gain ratio based feature selection method for privacy preservation. ICTACT J. Soft Comput, 1, 201-205.
- Shahbaz, M. B., X. Wang, A. Behnad & J. Samarabandu. 2016. On efficiency enhancement of the correlation-based feature selection for intrusion detection systems. In Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2016 IEEE 7th Annual, 1-7. IEEE.
- Sun, Y., F. Wang, B. Wang, Q. Chen, N. Engerer & Z. Mi (2016) Correlation Feature Selection and Mutual Information Theory Based Quantitative Research on Meteorological Impact Factors of Module Temperature for Solar Photovoltaic Systems. Energies, 10, 7.
- Witten, I. H., E. Frank & M. A. Hall (2011) Data Mining: Practical Machine Learning Tools and Techniques.
- Yu, L. & H. Liu. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03), 856-863.
- Yu, L. & H. Liu.(2004) Efficient feature selection via analysis of relevance and redundancy. Journal of machine learning research, 5, 1205-1224.
SINIFLANDIRMA AMAÇLI DEĞİŞKEN ALT KÜMESİ SEÇİMİ: BİR BANKACILIK UYGULAMASI
Yıl 2022,
, 480 - 498, 28.11.2022
Emrah Sezer
,
Özgür Çakır
Öz
Teknolojik gelişmelerin etkisi ile kaydedilen operasyonel veriler giderek artmaktadır. Veri miktarı ve çeşitliliğindeki artış nedeni ile analiz aşamasında ve analiz sonuçlarının değerlendirilmesi aşamasında birçok zorluk yaşanmaktadır. İlgili ve ilgisiz birçok verinin analiz aşamasına aktarılmasının sonucunda analizlerin yapılabilmesi için gerekli zaman ve kaynak gereksinimleri artmaktadır. Kaynakların ve zamanın daima sınırlı olacağı aşikardır. Bu çalışmanın amacı, bankacılık müşteri verileri üzerinde sınıflandırma amaçlı değişken seçimi uygulamaları yaparak ilgisiz değişkenleri elemek ve sınıflandırma çalışmasına katkıda bulunmaktır. Farklı değişken seçimi yöntemleri kullanılarak seçilen değişken alt kümeleri üzerinde sınıflandırma uygulaması yapılmıştır. Sınıflandırma sonuçları karşılaştırılarak değişken seçim yöntemlerinin başarısı ölçülmüştür.
Kaynakça
- Abraham, R., J. B. Simha & S. S. Iyengar (2009) Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining. International Journal of Computational Intelligence Research, 5, 116-129.
- Alhaj, T. A., M. M. Siraj, A. Zainal, H. T. Elshoush & F. Elhaj (2016) Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation. PloS one, 11, e0166017.
- Ali, S. I. & W. Shahzad. 2012. A feature subset selection method based on symmetric uncertainty and ant colony optimization. In Emerging Technologies (ICET), 2012 International Conference on, 1-6. IEEE.
- Blum, A. L. & P. Langley (1997) Selection of relevant features and examples in machine learning. Artificial intelligence, 97, 245-271.
- Das, S. 2001. Filters, wrappers and a boosting-based hybrid for feature selection. In ICML, 74-81.
- Dash, M. & H. Liu (1997) Feature selection for classification. Intelligent data analysis, 1, 131-156.
- Guyon, I. & A. Elisseeff (2003) An introduction to variable and feature selection. Journal of machine learning research, 3, 1157-1182.
- Hall, M. 2000. Correlation Based Feature Selection for Discrete and Numeric Class Machine Learning. In Proc. 17th Int'l. Conf. Machine Learning.
- Hall, M. A. 1999. Correlation-based feature selection for machine learning. The University of Waikato.
- Hall, M. A. & G. Holmes (2000) Benchmarking attribute selection techniques for data mining.
- Hall, M. A. & G. Holmes (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data engineering, 15, 1437-1447.
- Hall, M. A. & L. A. Smith (1997) Feature subset selection: a correlation based filter approach.
- Jungjit, S. 2016. New Multi-Label Correlation-Based Feature Selection Methods for Multi-Label Classification and Application in Bioinformatics. University of Kent.
- Kantardzic, M. 2011. Data mining: concepts, models, methods, and algorithms. John Wiley & Sons.
- Karegowda, A. G., A. Manjunath & M. Jayaram (2010) Comparative study of attribute selection using gain ratio and correlation based feature selection. International Journal of Information Technology and Knowledge Management, 2, 271-277.
- Kohavi, R. & G. H. John (1997) Wrappers for feature subset selection. Artificial intelligence, 97, 273-324.
- Ladha, L. & T. Deepa (2011) Feature selection methods and algorithms. International journal on computer science and engineering, 3, 1787-1797.
- Liu, H. & H. Motoda. 1998a. Feature extraction, construction and selection: A data mining perspective. Springer Science & Business Media.
- Liu, H. & H. Motoda. 1998b. Feature selection for knowledge discovery and data mining. Springer Science & Business Media.
- Pedrycz, W., G. Succi & A. Sillitti. 2016. Computational Intelligence and Quantitative Software Engineering. Springer.
- Piroonratana, T., W. Wongseree, T. Usavanarong, A. Assawamakin, C. Limwongse & N. Chaiyaratana. 2010. Identification of Ancestry Informative Markers from Chromosome-Wide Single Nucleotide Polymorphisms Using Symmetrical Uncertainty
Ranking. In Pattern Recognition (ICPR), 2010 20th International Conference on, 2448-2451. IEEE.
- Priyadarsini, R. P., M. Valarmathi & S. Sivakumari (2011) Gain ratio based feature selection method for privacy preservation. ICTACT J. Soft Comput, 1, 201-205.
- Shahbaz, M. B., X. Wang, A. Behnad & J. Samarabandu. 2016. On efficiency enhancement of the correlation-based feature selection for intrusion detection systems. In Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2016 IEEE 7th Annual, 1-7. IEEE.
- Sun, Y., F. Wang, B. Wang, Q. Chen, N. Engerer & Z. Mi (2016) Correlation Feature Selection and Mutual Information Theory Based Quantitative Research on Meteorological Impact Factors of Module Temperature for Solar Photovoltaic Systems. Energies, 10, 7.
- Witten, I. H., E. Frank & M. A. Hall (2011) Data Mining: Practical Machine Learning Tools and Techniques.
- Yu, L. & H. Liu. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03), 856-863.
- Yu, L. & H. Liu.(2004) Efficient feature selection via analysis of relevance and redundancy. Journal of machine learning research, 5, 1205-1224.