Socio-demographic determinants of smoking: A data mining analysis of the Global Adult Tobacco Surveys
Abstract
Objective: This paper presented a) how the Global Adult Tobacco Surveys (GATSs) data can be used for extracting valuable information about tobacco use behaviors of people and b) the prediction performance of the implemented classification algorithms on the GATS data.Methods: Three well-known classification methods: K-nearest neighbor, C4.5 algorithm, and multilayer perceptron were applied to assess the classifying performance for the smoking status of GATS participants (pre-defined classes: smoker and no smoker) based on the socio-demographic characteristics (age group, gender, residence, education level, and working status). The first analysis was performed on the GATS data from Turkey. Subsequently, the model producing the best performance for Turkey was also implemented for other six European countries: Greece, Kazakhstan, Poland, Romania, Russia, and Ukraine.Results: All of the tree algorithms were more confident to classify no smokers. The correct classification rate of C4.5 algorithm was the highest among the algorithms for the GATS Turkey data. In addition, the C4.5 algorithm classified the males more detailed than the females. The comparative analysis indicated that the C4.5 algorithm correctly classified the smoking status of participants of Ukraine over 80% while it was lower than 70% for Greece. Thus, the effects of demographic factors on smoking status can change from one country to another.Conclusion: This paper indicated that the data supplied by GATS such as demographic data may help to compute the likelihood of an individual to be a smoker in the future.
Keywords
References
- Jabbar MA, Deekshatulu BL, Chandra P. Classification of Heart Disease Using KNearest Neighbor and Genetic Algorithm. Procedia Technol. 2013;10:85-94.
- Kartelj A. Classification of Smoking Cessation Status Using Various Data Mining Methods. Math Balk New Ser. 2010;24(3-4):199-205.
- Segall RS, Guha GS, Nonis SA. Data mining of environmental stress tolerances on plants. Kybernetes. 2013;37:127-148.
- Montaño-Moreno JJ, Gervilla-García E, Cajal-Blasco B, Palmer A. Data mining classification techniques: an application to tobacco consumption in teenagers. An Psicol. 2014;30(2):633-641.
- Moon SS, Kang S-Y, Jitpitaklert W, Kim SB. Decision tree models for characterizing smoking patterns of older adults. Expert Syst Appl. 2012;39(1):445-451.
- Ding X, Bedingfield S, Yeh C-H, et al. Identifying Tobacco Control Policy Drivers: A Neural Network Approach. In: Leung CS, Lee M, Chan JH, eds. Neural Information Processing. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2009:770-776.
- Yun C-J, Ding X, Bedingfield S, et al. Performance Evaluation of Intelligent Prediction Models on Smokers’ Quitting Behaviour. In: Fyfe C, Kim D, Lee S-Y, Yin H, eds. Intelligent Data Engineering and Automated Learning – IDEAL 2008. Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2008:210-216.
- Sofean M, Smith M. Sentiment analysis on smoking in social networks. Stud Health Technol Inform. 2013;192:1118.
- Myslín M, Zhu S-H, Chapman W, Conway M. Using twitter to examine smoking behavior and perceptions of emerging tobacco products. J Med Internet Res. 2013;15(8):e174.
- Benjakul S, Termsirikulchai L, Hsia J, et al. Current manufactured cigarette smoking and roll-your-own cigarette smoking in Thailand: findings from the 2009 Global Adult Tobacco Survey. BMC Public Health. 2013;13:277.