Research Article
BibTex RIS Cite

Gender Prediction from Social Media Comments with Artificial Intelligence

Year 2019, , 1256 - 1264, 01.12.2019
https://doi.org/10.16984/saufenbilder.559452

Abstract

In the 21st century,
which can be termed as artificial age of intelligence, machine learning
techniques that can become widespread and improve themselves can be given more
quality services to humanity in many fields. As a result of these developments,
nowadays many companies deliver their products and services to their customers
via social media accounts. But not every customer is interested in all product
or service. Each customer's area of interest is different. Gender is one of the
main reasons for this difference. If the gender of a social media user is
determined correctly, the amount of sales may be increased by offering the
appropriate products or services. The main aim of our study is an estimation of
genders of the commenters thanks to machine learning techniques by analyzing
the comments of companies posting on Facebook. As a result of the study the
genders of the commenters were labelled according to the names by collecting
the comments from Facebook. The data set is divided into training and test data
as 70-30%. As a result of the study, it was seen that machine learning methods
predicted with similar accuracy rates, while the highest accuracy rate (74.13%)
was obtained by logistic regression method.

References

  • Lazer, D., Pentland, A. S., Adamic, L., Aral, S., Barabasi, A. L., Brewer, D., ... & Jebara, T. (2009). Life in the network: the coming age of computational social science. Science (New York, NY), 323(5915), 721.
  • Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Lucas, R. E., Agrawal, M., ... & Ungar, L. H. (2013, June). Characterizing Geographic Variation in Well-Being Using Tweets. In ICWSM (pp. 583-591).
  • Dodds, P. S., Harris, K. D., Kloumann, I. M., Bliss, C. A., & Danforth, C. M. (2011). Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PloS one, 6(12), e26752.
  • De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media. ICWSM, 13, 1-10.
  • Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., ... & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9), e73791.
  • Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 201218772.
  • Paul, M. J., & Dredze, M. (2011). You are what you Tweet: Analyzing Twitter for public health. Icwsm, 20, 265-272.
  • Marengoni, A., Angleman, S., Melis, R., Mangialasche, F., Karp, A., Garmen, A., ... & Fratiglioni, L. (2011). Aging with multimorbidity: a systematic review of the literature. Ageing research reviews, 10(4), 430-439.
  • McCrae, R. R., & Costa Jr, P. T. (1999). A five-factor theory of personality. Handbook of personality: Theory and research, 2(1999), 139-153.
  • Kern, M. L., Eichstaedt, J. C., Schwartz, H. A., Park, G., Ungar, L. H., Stillwell, D. J., ... & Seligman, M. E. (2014). From “Sooo excited!!!” to “So proud”: Using language to study development. Developmental psychology, 50(1), 178.
  • Pennebaker, J. W., & Stone, L. D. (2003). Words of wisdom: Language use over the life span. Journal of personality and social psychology, 85(2), 291.
  • Huffaker, D. A., & Calvert, S. L. (2005). Gender, identity, and language use in teenage blogs. Journal of computer-mediated communication, 10(2), JCMC10211.
  • Mislove, A., Lehmann, S., Ahn, Y. Y., Onnela, J. P., & Rosenquist, J. N. (2011). Understanding the Demographics of Twitter Users. ICWSM, 11(5th), 25.
  • Pennacchiotti, M., & Popescu, A. M. (2011). A Machine Learning Approach to Twitter User Classification. Icwsm, 11(1), 281-288.
  • Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010, October). Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37-44). ACM.
  • Al Zamal, F., Liu, W., & Ruths, D. (2012). Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors. ICWSM, 270.
  • Shlomo A., Moshe K., James W. P., & Jonathan S. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52(2):119–123.
  • Nguyen, D., Gravel, R., Trieschnigg, D., & Meder, T. (2013, July). " How Old Do You Think I Am?" A Study of Language and Age in Twitter. In ICWSM.
  • Rangel, F., & Rosso, P. (2013). Use of language and author profiling: Identification of gender and age. Natural Language Processing and Cognitive Science, 177.
  • Burger, J. D., & Henderson, J. C. (2006, March). An Exploration of Observable Features Related to Blogger Age. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (pp. 15-20).
  • Goswami, S., Sarkar, S., & Rustagi, M. (2009, March). Stylometric analysis of bloggers’ age and gender. In Third International AAAI Conference on Weblogs and Social Media.
  • Jones, R., Kumar, R., Pang, B., & Tomkins, A. (2007, November). I know what you did last summer: query logs and user privacy. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (pp. 909-914). ACM.
  • Liu, W., & Ruths, D. (2013, March). What's in a Name? Using First Names as Features for Gender Inference in Twitter. In AAAI spring symposium: Analyzing microtext (Vol. 13, No. 1, pp. 10-16).
  • Stone, P., Dunphy, D., Smith, M. (1966). The General Inquirer: A Computer Approach to Content Analysis. MIT press.
  • Coltheart, M. (1981). The mrc psycholinguistic database. The Quarterly Journal of Experimental Psychology 33: 497–505.
  • Pennebaker, J. W., Mehl, M. R., Niederhoffer, K. G. (2003). Psychological aspects of natural language use: our words, our selves. Annual Review of Psychology 54: 547–77.
  • Tausczik, Y., Pennebaker, J. (2010). The psychological meaning of words: Liwc and computerized text analysis methods. Journal of Language and Social Psychology 29: 24–54.
  • Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics.
  • Cetin, M., & Amasyali, M. F. (2013, April). Supervised and traditional term weighting methods for sentiment analysis. In Signal Processing and Communications Applications Conference (SIU), 2013 21st (pp. 1-4). IEEE.
  • Sevindi, B. I. (2013). Comparison of supervised and dictionary based sentiment analysis approaches on Turkish text (Doctoral dissertation, Master thesis, Gazi University, Turkey).
  • Nizam, H., & Akin S. S. (2014). Machine Learning in Social Media and the Comparison of the Balanced and Non-balanced Data Sets in Emotion Analysis. XIX. Internet Conference in Turkey,2014
  • Sap, M., Park, G., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., ... & Schwartz, H. A. (2014). Developing age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1146-1151).
  • Keane, M. A. (1996). Automated Design of Both the Topology and Sizing of Analog Electrical Circuits Using Genetic Programming. Artificial Intelligence in Design '96. Springer, Dordrecht. pp. 151–170.
  • Friedman, J. H. (1998). "Data Mining and Statistics: What's the connection?". Computing Science and Statistics. 29 (1): 3–9.
  • Gerven, M., & Bohte, S. (Eds.). (2018). Artificial neural networks as models of neural information processing. Frontiers Media SA.
  • Albayrak, A. S., & Yilmaz, O. G. S. K. (2009). Veri madenciliği: Karar ağacı algoritmaları ve İMKB verileri üzerine bir uygulama. Süleyman Demirel Üniversitesi İktisadi ve İdari Bilimler Fakültesi Dergisi, 14(1).
  • Celik, O., & Altunaydin, S. S. (2018). A Research on Machine Learning Methods and Its Applications. Online Learning, 1(3).
  • Guneren, H. (2015). Destek vektör makineleri kullanarak gömülü sistem üzerinde yüz tanıma uygulaması.
  • Ozkan, H. (2013). K-Means Kümeleme ve K-NN Sınıflandırma Algoritmalarının Öğrenci Notları ve Hastalık Verilerine Uygulanması Bitirme Tezi, İstanbul Teknik Üniversitesi, İstanbul.
  • Brownlee, J. (2016). A Gentle Introduction to XGBoost for Applied Machine Learning. Machine Learning Mastery. Available online: http://machinelearningmastery.com/gentle-introduction-xgboost-appliedmachine-learning/(accessed on 2 March 2018).
  • Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural networks, 10(5), 1048-1054.
  • https://www.cs.waikato.ac.nz/ml/weka/, (Access Date: 01.02.2018).
  • http://scikit-learn.org/, (Access Date: 01.02.2018).

Gender Prediction from Social Media Comments with Artificial Intelligence

Year 2019, , 1256 - 1264, 01.12.2019
https://doi.org/10.16984/saufenbilder.559452

Abstract

In the 21st century, which can be termed as artificial age of intelligence, machine learning techniques that can become widespread and improve themselves can be given more quality services to humanity in many fields. As a result of these developments, nowadays many companies deliver their products and services to their customers via social media accounts. But not every customer is interested in all product or service. Each customer's area of interest is different. Gender is one of the main reasons for this difference. If the gender of a social media user is determined correctly, the amount of sales may be increased by offering the appropriate products or services. The main aim of our study is an estimation of genders of the commenters thanks to machine learning techniques by analyzing the comments of companies posting on Facebook. As a result of the study the genders of the commenters were labelled according to the names by collecting the comments from Facebook. The data set is divided into training and test data as 70-30%. As a result of the study, it was seen that machine learning methods predicted with similar accuracy rates, while the highest accuracy rate (74.13%) was obtained by logistic regression method.

References

  • Lazer, D., Pentland, A. S., Adamic, L., Aral, S., Barabasi, A. L., Brewer, D., ... & Jebara, T. (2009). Life in the network: the coming age of computational social science. Science (New York, NY), 323(5915), 721.
  • Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Lucas, R. E., Agrawal, M., ... & Ungar, L. H. (2013, June). Characterizing Geographic Variation in Well-Being Using Tweets. In ICWSM (pp. 583-591).
  • Dodds, P. S., Harris, K. D., Kloumann, I. M., Bliss, C. A., & Danforth, C. M. (2011). Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PloS one, 6(12), e26752.
  • De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media. ICWSM, 13, 1-10.
  • Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., ... & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9), e73791.
  • Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 201218772.
  • Paul, M. J., & Dredze, M. (2011). You are what you Tweet: Analyzing Twitter for public health. Icwsm, 20, 265-272.
  • Marengoni, A., Angleman, S., Melis, R., Mangialasche, F., Karp, A., Garmen, A., ... & Fratiglioni, L. (2011). Aging with multimorbidity: a systematic review of the literature. Ageing research reviews, 10(4), 430-439.
  • McCrae, R. R., & Costa Jr, P. T. (1999). A five-factor theory of personality. Handbook of personality: Theory and research, 2(1999), 139-153.
  • Kern, M. L., Eichstaedt, J. C., Schwartz, H. A., Park, G., Ungar, L. H., Stillwell, D. J., ... & Seligman, M. E. (2014). From “Sooo excited!!!” to “So proud”: Using language to study development. Developmental psychology, 50(1), 178.
  • Pennebaker, J. W., & Stone, L. D. (2003). Words of wisdom: Language use over the life span. Journal of personality and social psychology, 85(2), 291.
  • Huffaker, D. A., & Calvert, S. L. (2005). Gender, identity, and language use in teenage blogs. Journal of computer-mediated communication, 10(2), JCMC10211.
  • Mislove, A., Lehmann, S., Ahn, Y. Y., Onnela, J. P., & Rosenquist, J. N. (2011). Understanding the Demographics of Twitter Users. ICWSM, 11(5th), 25.
  • Pennacchiotti, M., & Popescu, A. M. (2011). A Machine Learning Approach to Twitter User Classification. Icwsm, 11(1), 281-288.
  • Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010, October). Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37-44). ACM.
  • Al Zamal, F., Liu, W., & Ruths, D. (2012). Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors. ICWSM, 270.
  • Shlomo A., Moshe K., James W. P., & Jonathan S. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52(2):119–123.
  • Nguyen, D., Gravel, R., Trieschnigg, D., & Meder, T. (2013, July). " How Old Do You Think I Am?" A Study of Language and Age in Twitter. In ICWSM.
  • Rangel, F., & Rosso, P. (2013). Use of language and author profiling: Identification of gender and age. Natural Language Processing and Cognitive Science, 177.
  • Burger, J. D., & Henderson, J. C. (2006, March). An Exploration of Observable Features Related to Blogger Age. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (pp. 15-20).
  • Goswami, S., Sarkar, S., & Rustagi, M. (2009, March). Stylometric analysis of bloggers’ age and gender. In Third International AAAI Conference on Weblogs and Social Media.
  • Jones, R., Kumar, R., Pang, B., & Tomkins, A. (2007, November). I know what you did last summer: query logs and user privacy. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (pp. 909-914). ACM.
  • Liu, W., & Ruths, D. (2013, March). What's in a Name? Using First Names as Features for Gender Inference in Twitter. In AAAI spring symposium: Analyzing microtext (Vol. 13, No. 1, pp. 10-16).
  • Stone, P., Dunphy, D., Smith, M. (1966). The General Inquirer: A Computer Approach to Content Analysis. MIT press.
  • Coltheart, M. (1981). The mrc psycholinguistic database. The Quarterly Journal of Experimental Psychology 33: 497–505.
  • Pennebaker, J. W., Mehl, M. R., Niederhoffer, K. G. (2003). Psychological aspects of natural language use: our words, our selves. Annual Review of Psychology 54: 547–77.
  • Tausczik, Y., Pennebaker, J. (2010). The psychological meaning of words: Liwc and computerized text analysis methods. Journal of Language and Social Psychology 29: 24–54.
  • Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics.
  • Cetin, M., & Amasyali, M. F. (2013, April). Supervised and traditional term weighting methods for sentiment analysis. In Signal Processing and Communications Applications Conference (SIU), 2013 21st (pp. 1-4). IEEE.
  • Sevindi, B. I. (2013). Comparison of supervised and dictionary based sentiment analysis approaches on Turkish text (Doctoral dissertation, Master thesis, Gazi University, Turkey).
  • Nizam, H., & Akin S. S. (2014). Machine Learning in Social Media and the Comparison of the Balanced and Non-balanced Data Sets in Emotion Analysis. XIX. Internet Conference in Turkey,2014
  • Sap, M., Park, G., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., ... & Schwartz, H. A. (2014). Developing age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1146-1151).
  • Keane, M. A. (1996). Automated Design of Both the Topology and Sizing of Analog Electrical Circuits Using Genetic Programming. Artificial Intelligence in Design '96. Springer, Dordrecht. pp. 151–170.
  • Friedman, J. H. (1998). "Data Mining and Statistics: What's the connection?". Computing Science and Statistics. 29 (1): 3–9.
  • Gerven, M., & Bohte, S. (Eds.). (2018). Artificial neural networks as models of neural information processing. Frontiers Media SA.
  • Albayrak, A. S., & Yilmaz, O. G. S. K. (2009). Veri madenciliği: Karar ağacı algoritmaları ve İMKB verileri üzerine bir uygulama. Süleyman Demirel Üniversitesi İktisadi ve İdari Bilimler Fakültesi Dergisi, 14(1).
  • Celik, O., & Altunaydin, S. S. (2018). A Research on Machine Learning Methods and Its Applications. Online Learning, 1(3).
  • Guneren, H. (2015). Destek vektör makineleri kullanarak gömülü sistem üzerinde yüz tanıma uygulaması.
  • Ozkan, H. (2013). K-Means Kümeleme ve K-NN Sınıflandırma Algoritmalarının Öğrenci Notları ve Hastalık Verilerine Uygulanması Bitirme Tezi, İstanbul Teknik Üniversitesi, İstanbul.
  • Brownlee, J. (2016). A Gentle Introduction to XGBoost for Applied Machine Learning. Machine Learning Mastery. Available online: http://machinelearningmastery.com/gentle-introduction-xgboost-appliedmachine-learning/(accessed on 2 March 2018).
  • Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural networks, 10(5), 1048-1054.
  • https://www.cs.waikato.ac.nz/ml/weka/, (Access Date: 01.02.2018).
  • http://scikit-learn.org/, (Access Date: 01.02.2018).
There are 43 citations in total.

Details

Primary Language English
Subjects Artificial Intelligence
Journal Section Research Articles
Authors

Özer Çelik 0000-0002-4409-3101

Ahmet Faruk Aslan 0000-0003-1583-6508

Publication Date December 1, 2019
Submission Date April 30, 2019
Acceptance Date September 3, 2019
Published in Issue Year 2019

Cite

APA Çelik, Ö., & Aslan, A. F. (2019). Gender Prediction from Social Media Comments with Artificial Intelligence. Sakarya University Journal of Science, 23(6), 1256-1264. https://doi.org/10.16984/saufenbilder.559452
AMA Çelik Ö, Aslan AF. Gender Prediction from Social Media Comments with Artificial Intelligence. SAUJS. December 2019;23(6):1256-1264. doi:10.16984/saufenbilder.559452
Chicago Çelik, Özer, and Ahmet Faruk Aslan. “Gender Prediction from Social Media Comments With Artificial Intelligence”. Sakarya University Journal of Science 23, no. 6 (December 2019): 1256-64. https://doi.org/10.16984/saufenbilder.559452.
EndNote Çelik Ö, Aslan AF (December 1, 2019) Gender Prediction from Social Media Comments with Artificial Intelligence. Sakarya University Journal of Science 23 6 1256–1264.
IEEE Ö. Çelik and A. F. Aslan, “Gender Prediction from Social Media Comments with Artificial Intelligence”, SAUJS, vol. 23, no. 6, pp. 1256–1264, 2019, doi: 10.16984/saufenbilder.559452.
ISNAD Çelik, Özer - Aslan, Ahmet Faruk. “Gender Prediction from Social Media Comments With Artificial Intelligence”. Sakarya University Journal of Science 23/6 (December 2019), 1256-1264. https://doi.org/10.16984/saufenbilder.559452.
JAMA Çelik Ö, Aslan AF. Gender Prediction from Social Media Comments with Artificial Intelligence. SAUJS. 2019;23:1256–1264.
MLA Çelik, Özer and Ahmet Faruk Aslan. “Gender Prediction from Social Media Comments With Artificial Intelligence”. Sakarya University Journal of Science, vol. 23, no. 6, 2019, pp. 1256-64, doi:10.16984/saufenbilder.559452.
Vancouver Çelik Ö, Aslan AF. Gender Prediction from Social Media Comments with Artificial Intelligence. SAUJS. 2019;23(6):1256-64.

30930 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.