Research Article
PDF Zotero Mendeley EndNote BibTex Cite

Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys

Year 2021, Volume 19, Issue 3, 251 - 262, 02.12.2021
https://doi.org/10.20518/tjph.884692

Abstract

Objective: This paper presented a) how the Global Adult Tobacco Surveys (GATSs) data can be used for extracting valuable information about tobacco use behaviors of people and b) the prediction performance of the implemented classification algorithms on the GATS data.Methods: Three well-known classification methods: K-nearest neighbor, C4.5 algorithm, and multilayer perceptron were applied to assess the classifying performance for the smoking status of GATS participants (pre-defined classes: smoker and no smoker) based on the socio-demographic characteristics (age group, gender, residence, education level, and working status). The first analysis was performed on the GATS data from Turkey. Subsequently, the model producing the best performance for Turkey was also implemented for other six European countries: Greece, Kazakhstan, Poland, Romania, Russia, and Ukraine.Results: All of the tree algorithms were more confident to classify no smokers. The correct classification rate of C4.5 algorithm was the highest among the algorithms for the GATS Turkey data. In addition, the C4.5 algorithm classified the males more detailed than the females. The comparative analysis indicated that the C4.5 algorithm correctly classified the smoking status of participants of Ukraine over 80% while it was lower than 70% for Greece. Thus, the effects of demographic factors on smoking status can change from one country to another.Conclusion: This paper indicated that the data supplied by GATS such as demographic data may help to compute the likelihood of an individual to be a smoker in the future.

References

  • Jabbar MA, Deekshatulu BL, Chandra P. Classification of Heart Disease Using KNearest Neighbor and Genetic Algorithm. Procedia Technol. 2013;10:85-94.
  • Kartelj A. Classification of Smoking Cessation Status Using Various Data Mining Methods. Math Balk New Ser. 2010;24(3-4):199-205.
  • Segall RS, Guha GS, Nonis SA. Data mining of environmental stress tolerances on plants. Kybernetes. 2013;37:127-148.
  • Montaño-Moreno JJ, Gervilla-García E, Cajal-Blasco B, Palmer A. Data mining classification techniques: an application to tobacco consumption in teenagers. An Psicol. 2014;30(2):633-641.
  • Moon SS, Kang S-Y, Jitpitaklert W, Kim SB. Decision tree models for characterizing smoking patterns of older adults. Expert Syst Appl. 2012;39(1):445-451.
  • Ding X, Bedingfield S, Yeh C-H, et al. Identifying Tobacco Control Policy Drivers: A Neural Network Approach. In: Leung CS, Lee M, Chan JH, eds. Neural Information Processing. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2009:770-776.
  • Yun C-J, Ding X, Bedingfield S, et al. Performance Evaluation of Intelligent Prediction Models on Smokers’ Quitting Behaviour. In: Fyfe C, Kim D, Lee S-Y, Yin H, eds. Intelligent Data Engineering and Automated Learning – IDEAL 2008. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2008:210-216.
  • Sofean M, Smith M. Sentiment analysis on smoking in social networks. Stud Health Technol Inform. 2013;192:1118.
  • Myslín M, Zhu S-H, Chapman W, Conway M. Using twitter to examine smoking behavior and perceptions of emerging tobacco products. J Med Internet Res. 2013;15(8):e174.
  • Benjakul S, Termsirikulchai L, Hsia J, et al. Current manufactured cigarette smoking and roll-your-own cigarette smoking in Thailand: findings from the 2009 Global Adult Tobacco Survey. BMC Public Health. 2013;13:277.
  • Nollen NL, Ahluwalia JS, Lei Y, Yu Q, Scheuermann TS, Mayo MS. Adult Cigarette Smokers at Highest Risk for Concurrent Alternative Tobacco Product Use Among a Racially/Ethnically and Socioeconomically Diverse Sample. Nicotine Tob Res Off J Soc Res Nicotine Tob. 2016;18(4):386-394.
  • Singh A, Katyan H. Classification of nicotine-dependent users in India: a decision-tree approach. J Public Health. 2019;27(4):453-459.
  • Ding X, Yang Y, Stein EA, Ross TJ. Multivariate classification of smokers and nonsmokers using SVM-RFE on structural MRI images. Hum Brain Mapp. 2015;36(12):4869-4879.
  • McCormick PJ, Elhadad N, Stetson PD. Use of semantic features to classify patient smoking status. AMIA Annu Symp Proc. 2008;2008:450-454. Factors Affecing Smoking
  • Figueroa RL, Soto DA, Pino EJ. Identifying and extracting patient smoking status information from clinical narrative texts in Spanish. In: 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. ; 2014:2710-2713.
  • Wicentowski R, Sydes MR. Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries. J Am Med Inform Assoc JAMIA. 2008;15(1):29-31.
  • Sordo M, Zeng Q. On Sample Size and Classification Accuracy: A Performance Comparison. In: Oliveira JL, Maojo V, Martín-Sánchez F, Pereira AS, eds. Biological and Medical Data Analysis. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2005:193-201.
  • Huang Y, Britton J, Hubbard R, Lewis S. Who receives prescriptions for smoking cessation medications? An association rule mining analysis using a large primary care database. Tob Control. 2013;22(4):274-279.
  • Kaleta D, Usidame B, DziankowskaZaborszczyk E, Makowiec-Dąbrowska T. Socioeconomic Disparities in Age of Initiation and Ever Tobacco Smoking: Findings from Romania. Cent Eur J Public Health. 2015;23(4):299-305.
  • WHO [Internet]. Global Adult Tobacco Survey (GATS). WHO. [Cited:14.06.2016]. Avaliable from: http://www.who.int/tobacco/surveillance/survey/gats/en/
  • Hussain S, Alili AA. A pruning approach to optimize synaptic connections and select relevant input parameters for neural network modelling of solar radiation. Appl Soft Comput. 2017;52:898-908.
  • Tou JY, Tay YH, Lau PY. Recent trends in texture classification: A review. In: Symposium on Programs in Information & Communication Technology. 2009; 2(3).
  • Liao TW, Kuo RJ. Five discrete symbiotic organisms search algorithms for simultaneous optimization of feature subset and neighborhood size of KNN classification models. Appl Soft Comput. 2018;64:581-595.
  • Amaral JLM, Lopes AJ, Jansen JM, Faria ACD, Melo PL. An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms. Comput Methods Programs Biomed. 2013;112(3):441-454.
  • Classification Methods [Internet]. [Cited: 13.06.2017]. Avaliable from: http://www.d.umn.edu/~padhy005/Chapter5.html
  • Kaur G, Chhabra A. Improved J48 Classification Algorithm for the Prediction of Diabetes. Int J Comput Appl. 2014;98(22):13-17.
  • King MW, Resick PA. Data mining in psychological treatment research: a primer on classification and regression trees. J Consult Clin Psychol. 2014;82(5):895-905
  • Hsu-Che W, Ya-Han H, Yen-Hao H. Two-stage credit rating prediction using machine learning techniques. Kybernetes. 2014;43(7):1098-1113.
  • Zhu Y, Fang J. Logistic Regression–Based Trichotomous Classification Tree and Its Application in Medical Diagnosis. Med Decis Making. 2016;36(8):973-989.
  • Gholap J. Performance Tuning Of J48 Algorithm For Prediction Of Soil Fertility. Asian J Comput Sci Inf Technol. 2012;2(8). Accessed August 19, 2016. http://arxiv.org/abs/1208.3943
  • Nadeem M, Banka H, Venugopal R. Estimation of pellet size and strength of limestone and manganese concentrate using soft computing techniques. Appl Soft Comput. 2017;59:500-511.
  • Yan H, Jiang Y, Zheng J, Peng C, Li Q. A multilayer perceptron-based medical decision support system for heart disease diagnosis. Expert Syst Appl. 2006;30(2):272-281.
  • Malhotra R. An empirical framework for defect prediction using machine learning techniques with Android software. Appl Soft Comput. 2016;49:1034-1050.
  • Arora R, Suman. Comparative Analysis of Classification Algorithms on Different Datasets using WEKA. Int J Comput Appl. 2012;54:13. Factors Affecing Smoking
  • Riedmiller M. Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms. Comput Stand Interfaces. 1994;16(3):265-278.
  • Cross Validation [Internet]. [Cited:19.08.2016]. Avaliable from: https://www.cs.cmu.edu/~schneide/ tut5/node42.html
  • Steinbach WR, Richter K. Multiple Classification and Receiver Operating Characteristic (ROC) Analysis. Med Decis Making. 1987;7(4):234-237.
  • Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation - SIE07001. pdf. Accessed August 19, 2016. https://csem.flinders.edu.au/research/techreps/SIE07001.pdf
  • Danenas P, Garsva G. Credit risk evaluation modeling using evolutionary linear SVM classifiers and sliding window approach. Procedia Comput Sci. 2012;Complete(9):1324-1333.
  • WekaDataAnalysis [Internet]. [Cited: 15.07.2016]. Avaliable from: http://www.cs.usfca.edu/~pfrancislyon/ courses/640fall2015/WekaDataAnalysis.pdf
  • Mermer G, Dağhan Ş, Bilge A, Dönmez RÖ, Özsoy S, Günay T. Prevalence of Tobacco Use among School Teachers and Effect of Training on Tobacco Use in Western Turkey. Cent Eur J Public Health. 2016;24(2):137-143.

Sigara kullanımının sosyo-demografik belirleyicileri: Küresel Yetişkin Tütün Araştırmaları üzerine bir veri madenciliği analizi

Year 2021, Volume 19, Issue 3, 251 - 262, 02.12.2021
https://doi.org/10.20518/tjph.884692

Abstract

Amaç: Bu makale a) Küresel Yetişkin Tütün Araştırması (KYTA) verilerinin tütün kullanım davranışları hakkındaki değerli bilgileri ortaya çıkarmada nasıl kullanılabileceğini ve b)KYTA verileri üzerinde uygulanan sınıflandırma algoritmalarının performanslarını sunmaktadır.Yöntem: Üç iyi bilinen sınıflandırma yöntemi olan K -en yakın komşu algoritması, C4.5 algoritması ve çok katmanlı algılayıcısı KYTA katılımcılarının sosyo-demografik özellikleri (yaş grubu, cinsiyet, yerleşim yeri, eğitim düzeyi ve çalışma durumu) temel alınarak, sigara içme durumunu (önceden tanımlanmış sınıflar: sigara içen ve içmeyen) doğru sınıflandırma performansı değerlendirilmiştir. İlk analiz KYTA Türkiye verileri üzerinde gerçekleştirilmiştir. Daha sonra Türkiye için en iyi performansı üreten model altı farklı Avrupa ülkesi: Yunanistan, Kazakistan, Polonya, Romanya, Rusya ve Ukrayna verileri için de uygulanmıştır.Bulgular: Bütün ağaç algoritmaları sigara içmeyenleri tespit etmekte daha doğru sonuçlar vermektedir. C4.5 algoritmasının doğru sınıflandırma oranı, Türkiye için en yüksek olandır. Ülkeler için yapılan karşılaştırmalı analiz, C4.5 algoritmasının Ukrayna’daki katılımcıların sigara içme durumunu %80’in üzerinde doğru bir şekilde sınıflandırabildiğini ancak Yunanistan için bu oranını %70’in altında kaldığını göstermektedir.Sonuç: Bu makale, demografik veriler gibi KYTA tarafından sağlanan bilgilerin, bir bireyin gelecekte sigara içmesi olasılığının hesaplanmasına yardımcı olabileceğini ortaya koymaktadır

References

  • Jabbar MA, Deekshatulu BL, Chandra P. Classification of Heart Disease Using KNearest Neighbor and Genetic Algorithm. Procedia Technol. 2013;10:85-94.
  • Kartelj A. Classification of Smoking Cessation Status Using Various Data Mining Methods. Math Balk New Ser. 2010;24(3-4):199-205.
  • Segall RS, Guha GS, Nonis SA. Data mining of environmental stress tolerances on plants. Kybernetes. 2013;37:127-148.
  • Montaño-Moreno JJ, Gervilla-García E, Cajal-Blasco B, Palmer A. Data mining classification techniques: an application to tobacco consumption in teenagers. An Psicol. 2014;30(2):633-641.
  • Moon SS, Kang S-Y, Jitpitaklert W, Kim SB. Decision tree models for characterizing smoking patterns of older adults. Expert Syst Appl. 2012;39(1):445-451.
  • Ding X, Bedingfield S, Yeh C-H, et al. Identifying Tobacco Control Policy Drivers: A Neural Network Approach. In: Leung CS, Lee M, Chan JH, eds. Neural Information Processing. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2009:770-776.
  • Yun C-J, Ding X, Bedingfield S, et al. Performance Evaluation of Intelligent Prediction Models on Smokers’ Quitting Behaviour. In: Fyfe C, Kim D, Lee S-Y, Yin H, eds. Intelligent Data Engineering and Automated Learning – IDEAL 2008. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2008:210-216.
  • Sofean M, Smith M. Sentiment analysis on smoking in social networks. Stud Health Technol Inform. 2013;192:1118.
  • Myslín M, Zhu S-H, Chapman W, Conway M. Using twitter to examine smoking behavior and perceptions of emerging tobacco products. J Med Internet Res. 2013;15(8):e174.
  • Benjakul S, Termsirikulchai L, Hsia J, et al. Current manufactured cigarette smoking and roll-your-own cigarette smoking in Thailand: findings from the 2009 Global Adult Tobacco Survey. BMC Public Health. 2013;13:277.
  • Nollen NL, Ahluwalia JS, Lei Y, Yu Q, Scheuermann TS, Mayo MS. Adult Cigarette Smokers at Highest Risk for Concurrent Alternative Tobacco Product Use Among a Racially/Ethnically and Socioeconomically Diverse Sample. Nicotine Tob Res Off J Soc Res Nicotine Tob. 2016;18(4):386-394.
  • Singh A, Katyan H. Classification of nicotine-dependent users in India: a decision-tree approach. J Public Health. 2019;27(4):453-459.
  • Ding X, Yang Y, Stein EA, Ross TJ. Multivariate classification of smokers and nonsmokers using SVM-RFE on structural MRI images. Hum Brain Mapp. 2015;36(12):4869-4879.
  • McCormick PJ, Elhadad N, Stetson PD. Use of semantic features to classify patient smoking status. AMIA Annu Symp Proc. 2008;2008:450-454. Factors Affecing Smoking
  • Figueroa RL, Soto DA, Pino EJ. Identifying and extracting patient smoking status information from clinical narrative texts in Spanish. In: 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. ; 2014:2710-2713.
  • Wicentowski R, Sydes MR. Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries. J Am Med Inform Assoc JAMIA. 2008;15(1):29-31.
  • Sordo M, Zeng Q. On Sample Size and Classification Accuracy: A Performance Comparison. In: Oliveira JL, Maojo V, Martín-Sánchez F, Pereira AS, eds. Biological and Medical Data Analysis. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2005:193-201.
  • Huang Y, Britton J, Hubbard R, Lewis S. Who receives prescriptions for smoking cessation medications? An association rule mining analysis using a large primary care database. Tob Control. 2013;22(4):274-279.
  • Kaleta D, Usidame B, DziankowskaZaborszczyk E, Makowiec-Dąbrowska T. Socioeconomic Disparities in Age of Initiation and Ever Tobacco Smoking: Findings from Romania. Cent Eur J Public Health. 2015;23(4):299-305.
  • WHO [Internet]. Global Adult Tobacco Survey (GATS). WHO. [Cited:14.06.2016]. Avaliable from: http://www.who.int/tobacco/surveillance/survey/gats/en/
  • Hussain S, Alili AA. A pruning approach to optimize synaptic connections and select relevant input parameters for neural network modelling of solar radiation. Appl Soft Comput. 2017;52:898-908.
  • Tou JY, Tay YH, Lau PY. Recent trends in texture classification: A review. In: Symposium on Programs in Information & Communication Technology. 2009; 2(3).
  • Liao TW, Kuo RJ. Five discrete symbiotic organisms search algorithms for simultaneous optimization of feature subset and neighborhood size of KNN classification models. Appl Soft Comput. 2018;64:581-595.
  • Amaral JLM, Lopes AJ, Jansen JM, Faria ACD, Melo PL. An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms. Comput Methods Programs Biomed. 2013;112(3):441-454.
  • Classification Methods [Internet]. [Cited: 13.06.2017]. Avaliable from: http://www.d.umn.edu/~padhy005/Chapter5.html
  • Kaur G, Chhabra A. Improved J48 Classification Algorithm for the Prediction of Diabetes. Int J Comput Appl. 2014;98(22):13-17.
  • King MW, Resick PA. Data mining in psychological treatment research: a primer on classification and regression trees. J Consult Clin Psychol. 2014;82(5):895-905
  • Hsu-Che W, Ya-Han H, Yen-Hao H. Two-stage credit rating prediction using machine learning techniques. Kybernetes. 2014;43(7):1098-1113.
  • Zhu Y, Fang J. Logistic Regression–Based Trichotomous Classification Tree and Its Application in Medical Diagnosis. Med Decis Making. 2016;36(8):973-989.
  • Gholap J. Performance Tuning Of J48 Algorithm For Prediction Of Soil Fertility. Asian J Comput Sci Inf Technol. 2012;2(8). Accessed August 19, 2016. http://arxiv.org/abs/1208.3943
  • Nadeem M, Banka H, Venugopal R. Estimation of pellet size and strength of limestone and manganese concentrate using soft computing techniques. Appl Soft Comput. 2017;59:500-511.
  • Yan H, Jiang Y, Zheng J, Peng C, Li Q. A multilayer perceptron-based medical decision support system for heart disease diagnosis. Expert Syst Appl. 2006;30(2):272-281.
  • Malhotra R. An empirical framework for defect prediction using machine learning techniques with Android software. Appl Soft Comput. 2016;49:1034-1050.
  • Arora R, Suman. Comparative Analysis of Classification Algorithms on Different Datasets using WEKA. Int J Comput Appl. 2012;54:13. Factors Affecing Smoking
  • Riedmiller M. Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms. Comput Stand Interfaces. 1994;16(3):265-278.
  • Cross Validation [Internet]. [Cited:19.08.2016]. Avaliable from: https://www.cs.cmu.edu/~schneide/ tut5/node42.html
  • Steinbach WR, Richter K. Multiple Classification and Receiver Operating Characteristic (ROC) Analysis. Med Decis Making. 1987;7(4):234-237.
  • Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation - SIE07001. pdf. Accessed August 19, 2016. https://csem.flinders.edu.au/research/techreps/SIE07001.pdf
  • Danenas P, Garsva G. Credit risk evaluation modeling using evolutionary linear SVM classifiers and sliding window approach. Procedia Comput Sci. 2012;Complete(9):1324-1333.
  • WekaDataAnalysis [Internet]. [Cited: 15.07.2016]. Avaliable from: http://www.cs.usfca.edu/~pfrancislyon/ courses/640fall2015/WekaDataAnalysis.pdf
  • Mermer G, Dağhan Ş, Bilge A, Dönmez RÖ, Özsoy S, Günay T. Prevalence of Tobacco Use among School Teachers and Effect of Training on Tobacco Use in Western Turkey. Cent Eur J Public Health. 2016;24(2):137-143.

Details

Primary Language English
Subjects Health Care Sciences and Services
Journal Section Original Research
Authors

Zeynep DURMUŞOĞLU
Gaziantep University
0000-0001-7891-3764
Türkiye


Pınar KOCABEY ÇİFTÇİ (Primary Author)
Gaziantep University
0000-0003-0877-8127
Türkiye

Publication Date December 2, 2021
Application Date February 22, 2021
Acceptance Date October 1, 2021
Published in Issue Year 2021, Volume 19, Issue 3

Cite

Bibtex @research article { tjph884692, journal = {Turkish Journal of Public Health}, issn = {}, eissn = {1304-1088}, address = {}, publisher = {Turkish Society of Public Health Specialists}, year = {2021}, volume = {19}, pages = {251 - 262}, doi = {10.20518/tjph.884692}, title = {Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys}, key = {cite}, author = {Durmuşoğlu, Zeynep and Kocabey Çiftçi, Pınar} }
APA Durmuşoğlu, Z. & Kocabey Çiftçi, P. (2021). Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys . Turkish Journal of Public Health , 19 (3) , 251-262 . DOI: 10.20518/tjph.884692
MLA Durmuşoğlu, Z. , Kocabey Çiftçi, P. "Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys" . Turkish Journal of Public Health 19 (2021 ): 251-262 <https://dergipark.org.tr/en/pub/tjph/issue/66033/884692>
Chicago Durmuşoğlu, Z. , Kocabey Çiftçi, P. "Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys". Turkish Journal of Public Health 19 (2021 ): 251-262
RIS TY - JOUR T1 - Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys AU - Zeynep Durmuşoğlu , Pınar Kocabey Çiftçi Y1 - 2021 PY - 2021 N1 - doi: 10.20518/tjph.884692 DO - 10.20518/tjph.884692 T2 - Turkish Journal of Public Health JF - Journal JO - JOR SP - 251 EP - 262 VL - 19 IS - 3 SN - -1304-1088 M3 - doi: 10.20518/tjph.884692 UR - https://doi.org/10.20518/tjph.884692 Y2 - 2021 ER -
EndNote %0 Turkish Journal of Public Health Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys %A Zeynep Durmuşoğlu , Pınar Kocabey Çiftçi %T Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys %D 2021 %J Turkish Journal of Public Health %P -1304-1088 %V 19 %N 3 %R doi: 10.20518/tjph.884692 %U 10.20518/tjph.884692
ISNAD Durmuşoğlu, Zeynep , Kocabey Çiftçi, Pınar . "Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys". Turkish Journal of Public Health 19 / 3 (December 2021): 251-262 . https://doi.org/10.20518/tjph.884692
AMA Durmuşoğlu Z. , Kocabey Çiftçi P. Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys. TurkJPH. 2021; 19(3): 251-262.
Vancouver Durmuşoğlu Z. , Kocabey Çiftçi P. Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys. Turkish Journal of Public Health. 2021; 19(3): 251-262.
IEEE Z. Durmuşoğlu and P. Kocabey Çiftçi , "Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys", Turkish Journal of Public Health, vol. 19, no. 3, pp. 251-262, Dec. 2021, doi:10.20518/tjph.884692

13955                                        13956                                                             13958                                       13959      


TURKISH JOURNAL OF PUBLIC HEALTH - TURK J PUBLIC HEALTH. online-ISSN: 1304-1096 

Copyright holder Turkish Journal of Public Health. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International LicenseCreative Commons License