Türkçe Tweetlerden Makine Öğrenmesi ile Meslek Tahmini

İslam Mayda

doi:10.31590/ejosat.1168269

Araştırma Makalesi

Predicting Occupation with Machine Learning from Turkish Tweets

Yıl 2022, Sayı: 40, 55 - 60, 30.09.2022

İslam Mayda

https://doi.org/10.31590/ejosat.1168269

Öz

With the spread of social media platforms and the rapid increase in the number of users, the amount of data produced in social media is growing rapidly. One of the goals of scientific studies to extract information from this data is occupation prediction. Social media users' occupation information can be used in many different areas, especially in smart suggestion systems. In this study, it is aimed to make occupation prediction using Turkish tweets. Within the scope of the study, an occupation dataset consisting of 25,000 Turkish tweets was created and shared publicly. Various preprocessing steps were applied on this dataset, and feature sets were extracted using both the words themselves and the word roots. In the tests, tweets were used both singularly and combined in groups of 5 and 10. In the experiments in which Support Vector Machine and Logistic Regression methods were applied, tests were repeated by feature selection. While the best result was obtained as 74.90% accuracy in the experiments with singular tweets, the best performances were reported as 96.20% accuracy in experiments with tweets combined in groups of 5, and 99.00% accuracy in experiments with tweets combined in groups of 10. It has been seen that the using of word roots in the tests has higher success than using the words themselves, and the feature selection generally increases the success. At the end of the study, these results were discussed and suggestions for future studies were presented.

Anahtar Kelimeler

Occupation prediction , Profession identification , Machine learning , Natural language processing , Twitter

Kaynakça

Akın, M. D., & Akın, A. A. (2007). Türk Dilleri İçin Açık Kaynaklı Doğal Dil İşleme Kütüphanesi: Zemberek. Elektrik Mühendisliği, 431, 38-44.
Chu, W., & Chiu, C. (2014, Aralık). Predicting Occupation from Single Facial Images. IEEE International Symposium on Multimedia, Taichung, Tayvan. https://doi.org/10.1109/ISM.2014.13
Chu, W., & Chiu, C. (2016). Predicting Occupation from Images by Combining Face and Body Context Information. ACM Transactions on Multimedia Computing, Communications, and Applications, 13(1), 1-21. https://doi.org/10.1145/3009911
Hu, T., Xiao, H., Luo, J., & Nguyen, T. T. (2016, Mayıs). What the Language You Tweet Says About Your Occupation. The Tenth International AAAI Conference on Web and Social Media (ICWSM), Köln, Almanya. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/13020
Huang, Y., Yu, L., Wang, X., & Cui, B. (2015). A multi-source integration framework for user occupation inference in social media systems. World Wide Web, 18, 1247-1267. https://doi.org/10.1007/s11280-014-0300-6
Kepios. (2022, Temmuz). Global Social Media Statistics. https://datareportal.com/social-media-users
Kumar, P., Gupta, M., Gupta, M., & Sharma, A. (2020). Profession Identification Using Handwritten Text Images. Computer Vision and Image Processing (CVIP 2019), Communications in Computer and Information Science, 1148, 25-35. https://doi.org/10.1007/978-981-15-4018-9_3
Lv, X., Jin, P., Mu, L., Wan, S., & Yue, L. (2017). Detecting User Occupations on Microblogging Platforms: An Experimental Study. Web and Big Data, APWeb-WAIM 2017, Lecture Notes in Computer Science (LNCS), 10366, 331-345. https://doi.org/10.1007/978-3-319-63579-8_26
Pan, J., Bhardwaj, R., Lu, W., Chieu, H. L., Pan, X., & Puay, N. Y. (2019, Temmuz). Twitter Homophily: Network Based Prediction of User’s Occupation. The 57th Annual Meeting of the Association for Computational Linguistics, Floransa, İtalya. http://doi.org/10.18653/v1/P19-1252
Preoţiuc-Pietro, D., Lampos, V., & Aletras, N. (2015, Temmuz). An analysis of the user occupational class through Twitter content. The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Pekin, Çin. http://doi.org/10.3115/v1/P15-1169
Shao, M., Li, L., & Fu, Y. (2013, Aralık). What Do You Do? Occupation Recognition in a Photo via Social Context. IEEE International Conference on Computer Vision (ICCV), Sidney, Avustralya. https://doi.org/10.1109/ICCV.2013.451
Song, Z., Wang, M., Hua, X., & Yan, S. (2011, Kasım). Predicting Occupation via Human Clothing and Contexts. IEEE International Conference on Computer Vision (ICCV), Barselona, İspanya. https://doi.org/10.1109/ICCV.2011.6126355
Statista. (2022, Ocak). Most popular social networks worldwide as of January 2022, ranked by number of monthly active users. https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/
T.C. Cumhurbaşkanlığı İletişim Başkanlığı. (2022, Mayıs). Sosyal Ağ Haritası, Twitter Kullanım Raporu. http://sosyalagharitasi.gov.tr/report
Tu, C., Liu, Z., & Sun, M. (2015). PRISM: Profession Identification in Social Media with Personal Information and Community Structure. Social Media Processing (SMP 2015), Communications in Computer and Information Science, 568, 15-27. https://doi.org/10.1007/978-981-10-0080-5_2
Zhou, M., Xu, Y., & Zhao, X. (2012, Aralık). Study of Feature Extract on Microblog User Occupation Classification. Fourth International Symposium on Information Science and Engineering (ISISE), Şangay, Çin. https://doi.org/10.1109/ISISE.2012.14

Türkçe Tweetlerden Makine Öğrenmesi ile Meslek Tahmini

Yıl 2022, Sayı: 40, 55 - 60, 30.09.2022

İslam Mayda

https://doi.org/10.31590/ejosat.1168269

Öz

Sosyal medya platformlarının yaygınlaşması ve kullanıcı sayılarının hızla artmaya devam etmesiyle birlikte sosyal medyada üretilen veri miktarı da hızlı bir şekilde büyümektedir. Bu veriden bilgi çıkarmaya yönelik yapılan bilimsel çalışmaların hedeflerinden biri de meslek tahminidir. Sosyal medya kullanıcılarının meslek bilgisi, akıllı öneri sistemleri başta olmak üzere birçok farklı alanda kullanılabilmektedir. Bu çalışmada da Türkçe tweetler kullanılarak meslek tahmini yapılması amaçlanmıştır. Çalışma kapsamında öncelikle 25.000 Türkçe tweetten oluşan meslek veri kümesi oluşturulmuş ve kamuya açık olarak paylaşılmıştır. Bu veri kümesi üzerinde çeşitli önişleme adımları uygulanmış, hem kelimelerin kendileri hem de kelime kökleri kullanılarak özellik kümeleri çıkarılmıştır. Yapılan testlerde tweetler hem tekil olarak hem de 5’li ve 10’lu gruplar halinde birleştirilerek kullanılmıştır. Destek Vektör Makinesi ve Lojistik Regresyon yöntemlerinin uygulandığı deneylerde özellik seçimi yapılarak testler tekrar edilmiştir. Tekil tweetlerle yapılan deneylerde en iyi sonuç %74,90 doğruluk oranı olarak elde edilirken, 5’li gruplar halinde birleştirilmiş tweetlerle yapılan deneylerde %96,20 ve 10’lu gruplar halinde birleştirilmiş tweetlerle yapılan deneylerde %99,00 doğruluk oranları en iyi performanslar olarak raporlanmıştır. Testlerde kelime köklerinin kullanılmasının kelimelerin kendilerini kullanmaya göre daha yüksek başarı gösterdiği ve özellik seçiminin genel olarak başarıyı yükselttiği görülmüştür. Çalışmanın sonunda, alınan bu sonuçlar tartışılmış ve gelecek çalışmalara dair öneriler sunulmuştur.

Anahtar Kelimeler

Meslek tahmini , Meslek tespiti , Makine öğrenmesi , Doğal dil işleme , Twitter

Teşekkür

Veri toplama aşamasındaki desteklerinden dolayı Murat Karabulut’a teşekkür ediyorum.

Kaynakça

Akın, M. D., & Akın, A. A. (2007). Türk Dilleri İçin Açık Kaynaklı Doğal Dil İşleme Kütüphanesi: Zemberek. Elektrik Mühendisliği, 431, 38-44.
Chu, W., & Chiu, C. (2014, Aralık). Predicting Occupation from Single Facial Images. IEEE International Symposium on Multimedia, Taichung, Tayvan. https://doi.org/10.1109/ISM.2014.13
Chu, W., & Chiu, C. (2016). Predicting Occupation from Images by Combining Face and Body Context Information. ACM Transactions on Multimedia Computing, Communications, and Applications, 13(1), 1-21. https://doi.org/10.1145/3009911
Hu, T., Xiao, H., Luo, J., & Nguyen, T. T. (2016, Mayıs). What the Language You Tweet Says About Your Occupation. The Tenth International AAAI Conference on Web and Social Media (ICWSM), Köln, Almanya. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/13020
Huang, Y., Yu, L., Wang, X., & Cui, B. (2015). A multi-source integration framework for user occupation inference in social media systems. World Wide Web, 18, 1247-1267. https://doi.org/10.1007/s11280-014-0300-6
Kepios. (2022, Temmuz). Global Social Media Statistics. https://datareportal.com/social-media-users
Kumar, P., Gupta, M., Gupta, M., & Sharma, A. (2020). Profession Identification Using Handwritten Text Images. Computer Vision and Image Processing (CVIP 2019), Communications in Computer and Information Science, 1148, 25-35. https://doi.org/10.1007/978-981-15-4018-9_3
Lv, X., Jin, P., Mu, L., Wan, S., & Yue, L. (2017). Detecting User Occupations on Microblogging Platforms: An Experimental Study. Web and Big Data, APWeb-WAIM 2017, Lecture Notes in Computer Science (LNCS), 10366, 331-345. https://doi.org/10.1007/978-3-319-63579-8_26
Pan, J., Bhardwaj, R., Lu, W., Chieu, H. L., Pan, X., & Puay, N. Y. (2019, Temmuz). Twitter Homophily: Network Based Prediction of User’s Occupation. The 57th Annual Meeting of the Association for Computational Linguistics, Floransa, İtalya. http://doi.org/10.18653/v1/P19-1252
Preoţiuc-Pietro, D., Lampos, V., & Aletras, N. (2015, Temmuz). An analysis of the user occupational class through Twitter content. The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Pekin, Çin. http://doi.org/10.3115/v1/P15-1169
Shao, M., Li, L., & Fu, Y. (2013, Aralık). What Do You Do? Occupation Recognition in a Photo via Social Context. IEEE International Conference on Computer Vision (ICCV), Sidney, Avustralya. https://doi.org/10.1109/ICCV.2013.451
Song, Z., Wang, M., Hua, X., & Yan, S. (2011, Kasım). Predicting Occupation via Human Clothing and Contexts. IEEE International Conference on Computer Vision (ICCV), Barselona, İspanya. https://doi.org/10.1109/ICCV.2011.6126355
Statista. (2022, Ocak). Most popular social networks worldwide as of January 2022, ranked by number of monthly active users. https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/
T.C. Cumhurbaşkanlığı İletişim Başkanlığı. (2022, Mayıs). Sosyal Ağ Haritası, Twitter Kullanım Raporu. http://sosyalagharitasi.gov.tr/report
Tu, C., Liu, Z., & Sun, M. (2015). PRISM: Profession Identification in Social Media with Personal Information and Community Structure. Social Media Processing (SMP 2015), Communications in Computer and Information Science, 568, 15-27. https://doi.org/10.1007/978-981-10-0080-5_2
Zhou, M., Xu, Y., & Zhao, X. (2012, Aralık). Study of Feature Extract on Microblog User Occupation Classification. Fourth International Symposium on Information Science and Engineering (ISISE), Şangay, Çin. https://doi.org/10.1109/ISISE.2012.14

Toplam 16 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	İslam Mayda 0000-0001-5584-0259
Erken Görünüm Tarihi	26 Eylül 2022
Yayımlanma Tarihi	30 Eylül 2022
Yayımlandığı Sayı	Yıl 2022 Sayı: 40

Kaynak Göster

APA	Mayda, İ. (2022). Türkçe Tweetlerden Makine Öğrenmesi ile Meslek Tahmini. Avrupa Bilim ve Teknoloji Dergisi(40), 55-60. https://doi.org/10.31590/ejosat.1168269

Kapak Resmi İndir

Makale Dosyaları

Tam Metin