Research Article
BibTex RIS Cite

RapidMiner ile Twitter Verilerinin Konu Modellemesi

Year 2020, , 1 - 10, 30.06.2020
https://doi.org/10.33721/by.641878

Abstract

Bu çalışmada öncelikle RapidMiner kullanılarak Twitter’da belirli kelimeleri içeren tweet verileri elde edildi, bu veriler ön işlemden geçirildi ve sonrasında tweetlerin konu modellemesi yapıldı. Ön işleme için “Search Twitter”, “Select Attributes”, “Nominal to Text” blokları kullanıldı. Ön işlemden geçen Twitter verileri “Tokenize”, “Aggregate” ve “Discretize” operatörleri kullanılarak analiz edildi. Tweetlerde en çok kullanılan kelimeler belirlendi ve kullanım sıklığına göre kelime grupları oluşturuldu. Daha sonra Twitter verilerine nasıl konu bazlı kümeleme yapılacağı anlatıldı. Bu işlem için Latent Dirichlet Allocation modelini kullanan “Extract Topics From Documents (LDA)” operatörü kullanıldı. Tweetlerde en fazla kullanılan kelimeler ve kullanıcı başına atılan tweet sayıları, grafik ve tablolarla incelendi, ayrıca konu modellemesi sonucunda elde edilen konuların kelime bulutu oluşturuldu.

References

  • Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
  • Conover, M. D., Gonçalves, B., Ratkiewicz, J., Flammini, A. and Menczer, F. (2011, October). Predicting the Political Alignment of Twitter Users. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing (pp. 192-199). IEEE.
  • Corley, C., Cook, D., Mikler, A. and Singh, K. (2010). Text and Structural Data Mining of Influenza Mentions in Web and Social Media. International Journal of Environmental Research and Public Health, 7(2), 596-615.
  • Culotta, A. (2010, July). Towards Detecting Influenza Epidemics by Analyzing Twitter Messages. In Proceedings of the First Workshop on Social Media Analytics (pp. 115-122). Acm.
  • Earle, P. S., Bowden, D. C. and Guy, M. (2012). Twitter Earthquake Detection: Earthquake Monitoring in a Social World. Annals of Geophysics, 54(6).
  • Jain, A. P. and Katkar, V. D. (2015). Sentiments Analysis of Twitter Data Using Data Mining. In 2015 International Conference on Information Processing (ICIP) (pp. 807-810). IEEE.
  • Jiang, K. and Zheng, Y. (2013, December). Mining Twitter Data for Potential Drug Effects. In International Conference on Advanced Data Mining And Applications (pp. 434-443). Springer, Berlin, Heidelberg.
  • Lamba, M. and Madhusudhan, M. (2018). Application of Topic Mining and Prediction Modeling Tools for Library and Information Science Journals. Library Practices in Digital Era. Eds. MR Murali Prasad et al. Hyderabad: BS Publications, 395-401.
  • LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436-444.
  • Majid, A., Chen, L., Chen, G., Mirza, H. T., Hussain, I. and Woodward, J. (2013). A Context-Aware Personalized Travel Recommendation System Based on Geotagged Social Media Data Mining. International Journal of Geographical Information Science, 27(4), 662-684.
  • Mitchell, T. M. (1999). Machine Learning and Data Mining. Communications of the ACM, 42(11).
  • Tong, Z. and Zhang, H. (2016). A Text Mining Research Based on LDA Topic Modelling. International Conference on Computer Science, Engineering and Information Technology (pp. 201-210).

Topic Modeling of Twitter Data via RapidMiner

Year 2020, , 1 - 10, 30.06.2020
https://doi.org/10.33721/by.641878

Abstract

In this study, firstly, tweets containing specific words on the Twitter platform were obtained and pre-processed using the RapidMiner software. After that, the tweets are clustered based on the topic modeling approach. “Search Twitter”, “Select Attributes”, and “Nominal to Text” blocks were used for preprocessing. This preprocessed data is then analyzed using “Tokenize”, “Aggregate”, and “Discretize” operators. The most used words were determined, and tweets are grouped according to their frequencies. Then, it is explained how to perform topic-based modeling and clustering on Twitter data. “Extract Topics From Documents (LDA)” operator, which uses the Latent Dirichlet Allocation model, was used for this process. The most commonly used words in tweets, and the number of tweets per user were extracted and investigated via tables and graphical illustrations. In addition, the word cloud of each topic, obtained as a result of the topic modeling process, was created.

References

  • Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
  • Conover, M. D., Gonçalves, B., Ratkiewicz, J., Flammini, A. and Menczer, F. (2011, October). Predicting the Political Alignment of Twitter Users. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing (pp. 192-199). IEEE.
  • Corley, C., Cook, D., Mikler, A. and Singh, K. (2010). Text and Structural Data Mining of Influenza Mentions in Web and Social Media. International Journal of Environmental Research and Public Health, 7(2), 596-615.
  • Culotta, A. (2010, July). Towards Detecting Influenza Epidemics by Analyzing Twitter Messages. In Proceedings of the First Workshop on Social Media Analytics (pp. 115-122). Acm.
  • Earle, P. S., Bowden, D. C. and Guy, M. (2012). Twitter Earthquake Detection: Earthquake Monitoring in a Social World. Annals of Geophysics, 54(6).
  • Jain, A. P. and Katkar, V. D. (2015). Sentiments Analysis of Twitter Data Using Data Mining. In 2015 International Conference on Information Processing (ICIP) (pp. 807-810). IEEE.
  • Jiang, K. and Zheng, Y. (2013, December). Mining Twitter Data for Potential Drug Effects. In International Conference on Advanced Data Mining And Applications (pp. 434-443). Springer, Berlin, Heidelberg.
  • Lamba, M. and Madhusudhan, M. (2018). Application of Topic Mining and Prediction Modeling Tools for Library and Information Science Journals. Library Practices in Digital Era. Eds. MR Murali Prasad et al. Hyderabad: BS Publications, 395-401.
  • LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436-444.
  • Majid, A., Chen, L., Chen, G., Mirza, H. T., Hussain, I. and Woodward, J. (2013). A Context-Aware Personalized Travel Recommendation System Based on Geotagged Social Media Data Mining. International Journal of Geographical Information Science, 27(4), 662-684.
  • Mitchell, T. M. (1999). Machine Learning and Data Mining. Communications of the ACM, 42(11).
  • Tong, Z. and Zhang, H. (2016). A Text Mining Research Based on LDA Topic Modelling. International Conference on Computer Science, Engineering and Information Technology (pp. 201-210).
There are 12 citations in total.

Details

Primary Language Turkish
Subjects Library and Information Studies
Journal Section Peer- Reviewed Articles
Authors

Ela Ankaralı 0000-0002-7968-485X

Özgür Külcü This is me 0000-0002-2204-3170

Publication Date June 30, 2020
Submission Date November 2, 2019
Published in Issue Year 2020

Cite

APA Ankaralı, E., & Külcü, Ö. (2020). RapidMiner ile Twitter Verilerinin Konu Modellemesi. Bilgi Yönetimi, 3(1), 1-10. https://doi.org/10.33721/by.641878

15529