Research Article
BibTex RIS Cite

A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size

Year 2020, , 15 - 26, 30.12.2020
https://doi.org/10.33200/ijcer.733720

Abstract

The aim of the study is to analyze how classification performances change in accordance with sample size in logistic regression and CHAID analyses. The dataset used in this study was obtained by means of “Attentional Control Scale.” The scale was applied to 1824 students and the analyses were done by randomly choosing the samples from the dataset. Nine classification criteria were determined in order to evaluate classification performances of logistic regression and CHAID analyses, and the results were interpreted in consideration of these criteria. As a result of the analyses, it was found that classification performance in logistic regression showed no change as sample size increased, and performed a better classification in small sample size (N= between 25 and 900) than CHAID analysis. On the other hand, in the method of CHAID analysis it was seen that classification performance improved as sample size increased, and provided stronger findings in large sample size (N= 1000 and above). Moreover, in classification studies logistic regression analysis yielded more reliable results, and CHAID analysis provided stronger classifications. The results of this study are considered to suggest researchers to select the methods in classification studies based on sample size.

References

  • Akın, A., Kaya, Ç., Uysal, R., Çardak, M., Çitemel, N., Özdemir, E., & Gülşen, M. (2013). Dikkat Kontrol Ölçeği Türkçe Formu: Geçerlik ve Güvenirlik Çalışması [The Turkish version of the attentional control scale:the validity and reliability study]. Paper presented at VI. National Graduate Education Symposium. Retrieved from http://www.academia.edu/download/43723223/Eitim_Modelinin_renci_zerindeki_Etkilili20160314-25744-1i99q7c.pdf#page=19
  • Akpınar, H. (2000). Veri tabanlarında bilgi keşfi ve veri madenciliği [Knowledge discovery and data mining in databases]. Istanbul Business Research, 29(1), 1-22. Retrieved from https://dergipark.org.tr/tr/pub/ibr/archive
  • Balcı, A. (2015). Sosyal bilimlerde araştırma yöntem, teknik ve ilkeler[Research methods, techniques and principles in social sciences]. Ankara: Pegem Akademi.
  • Berry M., & Linoff G., (1997). Data Mining Techniques for Marketing Sales and Customer Support. John Wiley & Sons.
  • Brewer S. L. (2012). An empirical comparison of logistic regression to decision tree induction in the prediction of intimate partner violence reassault. (Doctoral dissertation). Retrieved from https://www.proquest.com/
  • Bulut, N. (2015). İzleme amaçlı klinik araştırmalarda öngörülen ölçütlere göre örneklem büyüklüğünün belirlenmesi [Determination of sample size by criterias proposed on monitoring in clinical research]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Çakır, Ö. (2008). Veri madenciliğinde sınıflandırma yöntemlerinin karşılaştırılması “bankacılık müşteri veri tabanı üzerinde bir uygulama”[ Comparison of classification methods in data mining "an application on banking customer database"]. (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences. NJ: Erlbaum Hillsdale.
  • Deeks, J. J., & Altman, D. G. (2004). Diagnostic tests 4: likelihood ratios. Bmj, 329(7458), 168-169. https://doi.org/10.1136/bmj.329.7458.168
  • Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statist. Med., 26, 3385–3397. https://doi.org/10.1002/sim.2771
  • Ekici, E. (2012). Farklı sınıflandırma yöntemlerinin karşılaştırılması ve bir uygulama[An application on the comparison of various classification methods]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Fajkowska, M. & Derryberry, D. (2010) . Psychometric properties of Attentional Control Scale: The preliminary study on a Polish sample. Polish Psychological Bulletin, 41(1), 1-7. https://doi.org/10.2478/s10059-010-0001-7
  • Finch, H., & Schneider, M. K. (2007). Classification accuracy of neural networks vs. discriminant analysis, logistic regression, and classification and regression trees. Methodology, 3(2), 47-57. https://doi.org/10.1027/1614-2241.3.2.47
  • Grimes, D. A., & Schulz, K. F. (2005). Refining clinical diagnosis with likelihood ratios. The Lancet, 365(9469), 1500-1505. https://doi.org/10.1016/S0140-6736(05)66422-7
  • Heckert, D.A., & Gondolf, E.W. (2005). Do multiple outcomes and conditional factors improve prediction of batterer reassault? Violence and Victims, 20 (1), 3-24. https://doi.org/10.1891/vivi.2005.20.1.3
  • Karakış, R., (2009). Yapay sinir ağları ve lojistik regresyon yöntemleri ile meme kanseri koltuk altı lenf nodu durumunun belirlenmesi[Prediction of the axillary lymph node status in breast cancer using artificial neural network and logistic regression analysis methods]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kayri, M., & Boysan, M. (2007). Araştırmalarda CHAID analizinin kullanımı ve baş etme stratejileri ile ilgili bir uygulama[Using Chaid analysis in researches and an application pertaining to coping strategies]. Ankara University Journal of Faculty of Educational Sciences. 40(2), 133-149. https://doi.org/10.1501/Egifak_0000000146
  • King, R. D., Feng, C., & Sutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence an International Journal, 9(3), 289-333. https://doi.org/10.1080/08839519508945477
  • Kıran, Z. B. (2010). Lojistik regresyon ve CART analizi teknikleriyle sosyal güvenlik kurumu ilaç provizyon sistemi verileri üzerinde bir uygulama[An application on pharmacy provision system data of social security institution by logistic regression and CART analysis technics]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/ Köktürk, F. (2012). K-en yakın komşuluk, yapay sinir ağları ve karar ağaçları yöntemlerinin sınıflandırma başarılarının karşılaştırılması[comparing classification success of k-nearest neighbor, artifical neural network and decision trees]. (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Koyuncu, M. S., (2015). Psikolojik ölçeklerde ROC analizi yöntemiyle standart belirleme[Standard determination in psychological scales using ROC analysis]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kurt, İ. & Türe, M.(2005). Tıp öğrencilerinde alkol kullanımını etkileyen faktörlerin belirlenmesinde yapay sinir ağları ile lojistik regresyon analizi’nin karşılaştırılması[Comparison of artificial neural networks and logistic regression analysis in determining factors affecting alcohol consumption among medicine students]. The Balkan Medical Journal. 22(3), 142-153. Retrieved from https://dergipark.org.tr/en/pub/bmj/issue/3749/49838
  • Medcalc. (2018). Software manual. Retrieved from https://www.medcalc.org/download/medcalcmanual.pdf
  • Nemes, S., Jonasson, J.M., Genell, A., & Steineck, G. (2009). Bias in odds ratios by logistic regression modelling and sample size. BMC Medical Research Methodology, 56(9), 1-5. https://doi.org/10.1186/1471-2288-9-56
  • Neuilly, M. A., Zgoba, K. M., Tita, G. E., & Lee, S. S. (2011). Predicting recidivism in homicide offenders using classification tree analysis. Homicide Studies, 15(2), 154-176. https://doi.org/10.1177/1088767911406867
  • Pehlivan, G. (2006). CHAID analizi ve bir uygulama[CHAID analysis and an application]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Sabzevari, H., Soleymani, M., & Noorbakhsh, E. (2007). A comparison between statistical and data mining methods for credit scoring in case of limited available data. In Proceedings of the 3rd CRC Credit Scoring Conference (pp. 1-5).
  • Stafford, J.D., Kaminski, R.M. , Reinecke K.J., & Gerard, P.D., (2006). Multi-stage sampling for large scale natural resources surveys: a case study of rice and waterfowl. Journal of Environtmental Management, 78, 353-361. https://doi.org/10.1016/j.jenvman.2005.04.029
  • Tabachnick, B.G. & Fidell, L.S. (2013). Multivariate statistics. New Jersey: Pearson Education Inc.
  • Tan, Ş. (2016). SPSS ve excel uygulamalı temel istatistik-1[Basic statistics-1 with SPSS and excel application]. Ankara: Pegem Akademi. https://doi.org/10.14527/9786053183877
  • Zurada, J., & Lonial, S. (2005). Comparison of the performance of several data mining methods for bad debt recovery in the healthcare industry. Journal of Applied Business Research, 21(2), 37-54. https://doi.org/10.19030/jabr.v21i2.1488
Year 2020, , 15 - 26, 30.12.2020
https://doi.org/10.33200/ijcer.733720

Abstract

References

  • Akın, A., Kaya, Ç., Uysal, R., Çardak, M., Çitemel, N., Özdemir, E., & Gülşen, M. (2013). Dikkat Kontrol Ölçeği Türkçe Formu: Geçerlik ve Güvenirlik Çalışması [The Turkish version of the attentional control scale:the validity and reliability study]. Paper presented at VI. National Graduate Education Symposium. Retrieved from http://www.academia.edu/download/43723223/Eitim_Modelinin_renci_zerindeki_Etkilili20160314-25744-1i99q7c.pdf#page=19
  • Akpınar, H. (2000). Veri tabanlarında bilgi keşfi ve veri madenciliği [Knowledge discovery and data mining in databases]. Istanbul Business Research, 29(1), 1-22. Retrieved from https://dergipark.org.tr/tr/pub/ibr/archive
  • Balcı, A. (2015). Sosyal bilimlerde araştırma yöntem, teknik ve ilkeler[Research methods, techniques and principles in social sciences]. Ankara: Pegem Akademi.
  • Berry M., & Linoff G., (1997). Data Mining Techniques for Marketing Sales and Customer Support. John Wiley & Sons.
  • Brewer S. L. (2012). An empirical comparison of logistic regression to decision tree induction in the prediction of intimate partner violence reassault. (Doctoral dissertation). Retrieved from https://www.proquest.com/
  • Bulut, N. (2015). İzleme amaçlı klinik araştırmalarda öngörülen ölçütlere göre örneklem büyüklüğünün belirlenmesi [Determination of sample size by criterias proposed on monitoring in clinical research]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Çakır, Ö. (2008). Veri madenciliğinde sınıflandırma yöntemlerinin karşılaştırılması “bankacılık müşteri veri tabanı üzerinde bir uygulama”[ Comparison of classification methods in data mining "an application on banking customer database"]. (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences. NJ: Erlbaum Hillsdale.
  • Deeks, J. J., & Altman, D. G. (2004). Diagnostic tests 4: likelihood ratios. Bmj, 329(7458), 168-169. https://doi.org/10.1136/bmj.329.7458.168
  • Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statist. Med., 26, 3385–3397. https://doi.org/10.1002/sim.2771
  • Ekici, E. (2012). Farklı sınıflandırma yöntemlerinin karşılaştırılması ve bir uygulama[An application on the comparison of various classification methods]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Fajkowska, M. & Derryberry, D. (2010) . Psychometric properties of Attentional Control Scale: The preliminary study on a Polish sample. Polish Psychological Bulletin, 41(1), 1-7. https://doi.org/10.2478/s10059-010-0001-7
  • Finch, H., & Schneider, M. K. (2007). Classification accuracy of neural networks vs. discriminant analysis, logistic regression, and classification and regression trees. Methodology, 3(2), 47-57. https://doi.org/10.1027/1614-2241.3.2.47
  • Grimes, D. A., & Schulz, K. F. (2005). Refining clinical diagnosis with likelihood ratios. The Lancet, 365(9469), 1500-1505. https://doi.org/10.1016/S0140-6736(05)66422-7
  • Heckert, D.A., & Gondolf, E.W. (2005). Do multiple outcomes and conditional factors improve prediction of batterer reassault? Violence and Victims, 20 (1), 3-24. https://doi.org/10.1891/vivi.2005.20.1.3
  • Karakış, R., (2009). Yapay sinir ağları ve lojistik regresyon yöntemleri ile meme kanseri koltuk altı lenf nodu durumunun belirlenmesi[Prediction of the axillary lymph node status in breast cancer using artificial neural network and logistic regression analysis methods]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kayri, M., & Boysan, M. (2007). Araştırmalarda CHAID analizinin kullanımı ve baş etme stratejileri ile ilgili bir uygulama[Using Chaid analysis in researches and an application pertaining to coping strategies]. Ankara University Journal of Faculty of Educational Sciences. 40(2), 133-149. https://doi.org/10.1501/Egifak_0000000146
  • King, R. D., Feng, C., & Sutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence an International Journal, 9(3), 289-333. https://doi.org/10.1080/08839519508945477
  • Kıran, Z. B. (2010). Lojistik regresyon ve CART analizi teknikleriyle sosyal güvenlik kurumu ilaç provizyon sistemi verileri üzerinde bir uygulama[An application on pharmacy provision system data of social security institution by logistic regression and CART analysis technics]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/ Köktürk, F. (2012). K-en yakın komşuluk, yapay sinir ağları ve karar ağaçları yöntemlerinin sınıflandırma başarılarının karşılaştırılması[comparing classification success of k-nearest neighbor, artifical neural network and decision trees]. (Doctoral dissertation). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Koyuncu, M. S., (2015). Psikolojik ölçeklerde ROC analizi yöntemiyle standart belirleme[Standard determination in psychological scales using ROC analysis]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Kurt, İ. & Türe, M.(2005). Tıp öğrencilerinde alkol kullanımını etkileyen faktörlerin belirlenmesinde yapay sinir ağları ile lojistik regresyon analizi’nin karşılaştırılması[Comparison of artificial neural networks and logistic regression analysis in determining factors affecting alcohol consumption among medicine students]. The Balkan Medical Journal. 22(3), 142-153. Retrieved from https://dergipark.org.tr/en/pub/bmj/issue/3749/49838
  • Medcalc. (2018). Software manual. Retrieved from https://www.medcalc.org/download/medcalcmanual.pdf
  • Nemes, S., Jonasson, J.M., Genell, A., & Steineck, G. (2009). Bias in odds ratios by logistic regression modelling and sample size. BMC Medical Research Methodology, 56(9), 1-5. https://doi.org/10.1186/1471-2288-9-56
  • Neuilly, M. A., Zgoba, K. M., Tita, G. E., & Lee, S. S. (2011). Predicting recidivism in homicide offenders using classification tree analysis. Homicide Studies, 15(2), 154-176. https://doi.org/10.1177/1088767911406867
  • Pehlivan, G. (2006). CHAID analizi ve bir uygulama[CHAID analysis and an application]. (Master thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
  • Sabzevari, H., Soleymani, M., & Noorbakhsh, E. (2007). A comparison between statistical and data mining methods for credit scoring in case of limited available data. In Proceedings of the 3rd CRC Credit Scoring Conference (pp. 1-5).
  • Stafford, J.D., Kaminski, R.M. , Reinecke K.J., & Gerard, P.D., (2006). Multi-stage sampling for large scale natural resources surveys: a case study of rice and waterfowl. Journal of Environtmental Management, 78, 353-361. https://doi.org/10.1016/j.jenvman.2005.04.029
  • Tabachnick, B.G. & Fidell, L.S. (2013). Multivariate statistics. New Jersey: Pearson Education Inc.
  • Tan, Ş. (2016). SPSS ve excel uygulamalı temel istatistik-1[Basic statistics-1 with SPSS and excel application]. Ankara: Pegem Akademi. https://doi.org/10.14527/9786053183877
  • Zurada, J., & Lonial, S. (2005). Comparison of the performance of several data mining methods for bad debt recovery in the healthcare industry. Journal of Applied Business Research, 21(2), 37-54. https://doi.org/10.19030/jabr.v21i2.1488
There are 30 citations in total.

Details

Primary Language English
Journal Section Articles
Authors

Mehmet Şata 0000-0003-2683-4997

Fuat Elkonca This is me 0000-0002-2733-8891

Publication Date December 30, 2020
Published in Issue Year 2020

Cite

APA Şata, M., & Elkonca, F. (2020). A Comparison of Classification Performances between the Methods of Logistics Regression and CHAID Analysis in accordance with Sample Size. International Journal of Contemporary Educational Research, 7(2), 15-26. https://doi.org/10.33200/ijcer.733720

133171332113318  2351823524 13319 13327 13323  13322


13325

Bu eser Creative Commons Atıf-GayriTicari-Türetilemez 4.0 Uluslararası Lisansı ile lisanslanmıştır.

IJCER (International Journal of Contemporary Educational Research) ISSN: 2148-3868