RapidMiner ile Twitter Verilerinin Konu Modellemesi

Ela Ankaralı; Özgür Külcü

doi:10.33721/by.641878

Araştırma Makalesi

RapidMiner ile Twitter Verilerinin Konu Modellemesi

Yıl 2020, Cilt: 3 Sayı: 1, 1 - 10, 30.06.2020

Ela Ankaralı , Özgür Külcü

https://doi.org/10.33721/by.641878

Cited By: 4

Öz

Bu çalışmada öncelikle RapidMiner kullanılarak Twitter’da belirli kelimeleri içeren tweet verileri elde edildi, bu veriler ön işlemden geçirildi ve sonrasında tweetlerin konu modellemesi yapıldı. Ön işleme için “Search Twitter”, “Select Attributes”, “Nominal to Text” blokları kullanıldı. Ön işlemden geçen Twitter verileri “Tokenize”, “Aggregate” ve “Discretize” operatörleri kullanılarak analiz edildi. Tweetlerde en çok kullanılan kelimeler belirlendi ve kullanım sıklığına göre kelime grupları oluşturuldu. Daha sonra Twitter verilerine nasıl konu bazlı kümeleme yapılacağı anlatıldı. Bu işlem için Latent Dirichlet Allocation modelini kullanan “Extract Topics From Documents (LDA)” operatörü kullanıldı. Tweetlerde en fazla kullanılan kelimeler ve kullanıcı başına atılan tweet sayıları, grafik ve tablolarla incelendi, ayrıca konu modellemesi sonucunda elde edilen konuların kelime bulutu oluşturuldu.

Anahtar Kelimeler

RapidMiner, Konu Modelleme, Twitter, Veri Analizi, Veri Madenciliği

Kaynakça

Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
Conover, M. D., Gonçalves, B., Ratkiewicz, J., Flammini, A. and Menczer, F. (2011, October). Predicting the Political Alignment of Twitter Users. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing (pp. 192-199). IEEE.
Corley, C., Cook, D., Mikler, A. and Singh, K. (2010). Text and Structural Data Mining of Influenza Mentions in Web and Social Media. International Journal of Environmental Research and Public Health, 7(2), 596-615.
Culotta, A. (2010, July). Towards Detecting Influenza Epidemics by Analyzing Twitter Messages. In Proceedings of the First Workshop on Social Media Analytics (pp. 115-122). Acm.
Earle, P. S., Bowden, D. C. and Guy, M. (2012). Twitter Earthquake Detection: Earthquake Monitoring in a Social World. Annals of Geophysics, 54(6).
Jain, A. P. and Katkar, V. D. (2015). Sentiments Analysis of Twitter Data Using Data Mining. In 2015 International Conference on Information Processing (ICIP) (pp. 807-810). IEEE.
Jiang, K. and Zheng, Y. (2013, December). Mining Twitter Data for Potential Drug Effects. In International Conference on Advanced Data Mining And Applications (pp. 434-443). Springer, Berlin, Heidelberg.
Lamba, M. and Madhusudhan, M. (2018). Application of Topic Mining and Prediction Modeling Tools for Library and Information Science Journals. Library Practices in Digital Era. Eds. MR Murali Prasad et al. Hyderabad: BS Publications, 395-401.
LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436-444.
Majid, A., Chen, L., Chen, G., Mirza, H. T., Hussain, I. and Woodward, J. (2013). A Context-Aware Personalized Travel Recommendation System Based on Geotagged Social Media Data Mining. International Journal of Geographical Information Science, 27(4), 662-684.
Mitchell, T. M. (1999). Machine Learning and Data Mining. Communications of the ACM, 42(11).
Tong, Z. and Zhang, H. (2016). A Text Mining Research Based on LDA Topic Modelling. International Conference on Computer Science, Engineering and Information Technology (pp. 201-210).

Topic Modeling of Twitter Data via RapidMiner

Yıl 2020, Cilt: 3 Sayı: 1, 1 - 10, 30.06.2020

Ela Ankaralı , Özgür Külcü

https://doi.org/10.33721/by.641878

Cited By: 4

Öz

In this study, firstly, tweets containing specific words on the Twitter platform were obtained and pre-processed using the RapidMiner software. After that, the tweets are clustered based on the topic modeling approach. “Search Twitter”, “Select Attributes”, and “Nominal to Text” blocks were used for preprocessing. This preprocessed data is then analyzed using “Tokenize”, “Aggregate”, and “Discretize” operators. The most used words were determined, and tweets are grouped according to their frequencies. Then, it is explained how to perform topic-based modeling and clustering on Twitter data. “Extract Topics From Documents (LDA)” operator, which uses the Latent Dirichlet Allocation model, was used for this process. The most commonly used words in tweets, and the number of tweets per user were extracted and investigated via tables and graphical illustrations. In addition, the word cloud of each topic, obtained as a result of the topic modeling process, was created.

Anahtar Kelimeler

Data Mining, Data Analysis, Topic Modeling, Twitter, RapidMiner

Kaynakça

Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
Conover, M. D., Gonçalves, B., Ratkiewicz, J., Flammini, A. and Menczer, F. (2011, October). Predicting the Political Alignment of Twitter Users. In 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing (pp. 192-199). IEEE.
Corley, C., Cook, D., Mikler, A. and Singh, K. (2010). Text and Structural Data Mining of Influenza Mentions in Web and Social Media. International Journal of Environmental Research and Public Health, 7(2), 596-615.
Culotta, A. (2010, July). Towards Detecting Influenza Epidemics by Analyzing Twitter Messages. In Proceedings of the First Workshop on Social Media Analytics (pp. 115-122). Acm.
Earle, P. S., Bowden, D. C. and Guy, M. (2012). Twitter Earthquake Detection: Earthquake Monitoring in a Social World. Annals of Geophysics, 54(6).
Jain, A. P. and Katkar, V. D. (2015). Sentiments Analysis of Twitter Data Using Data Mining. In 2015 International Conference on Information Processing (ICIP) (pp. 807-810). IEEE.
Jiang, K. and Zheng, Y. (2013, December). Mining Twitter Data for Potential Drug Effects. In International Conference on Advanced Data Mining And Applications (pp. 434-443). Springer, Berlin, Heidelberg.
Lamba, M. and Madhusudhan, M. (2018). Application of Topic Mining and Prediction Modeling Tools for Library and Information Science Journals. Library Practices in Digital Era. Eds. MR Murali Prasad et al. Hyderabad: BS Publications, 395-401.
LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436-444.
Majid, A., Chen, L., Chen, G., Mirza, H. T., Hussain, I. and Woodward, J. (2013). A Context-Aware Personalized Travel Recommendation System Based on Geotagged Social Media Data Mining. International Journal of Geographical Information Science, 27(4), 662-684.
Mitchell, T. M. (1999). Machine Learning and Data Mining. Communications of the ACM, 42(11).
Tong, Z. and Zhang, H. (2016). A Text Mining Research Based on LDA Topic Modelling. International Conference on Computer Science, Engineering and Information Technology (pp. 201-210).

Toplam 12 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Kütüphane ve Bilgi Çalışmaları
Bölüm	Hakemli Makaleler
Yazarlar	Ela Ankaralı 0000-0002-7968-485X Özgür Külcü Bu kişi benim 0000-0002-2204-3170
Yayımlanma Tarihi	30 Haziran 2020
Gönderilme Tarihi	2 Kasım 2019
Yayımlandığı Sayı	Yıl 2020 Cilt: 3 Sayı: 1

Kaynak Göster

APA	Ankaralı, E., & Külcü, Ö. (2020). RapidMiner ile Twitter Verilerinin Konu Modellemesi. Bilgi Yönetimi, 3(1), 1-10. https://doi.org/10.33721/by.641878

Bilgi Yönetimi

RapidMiner ile Twitter Verilerinin Konu Modellemesi

Öz

Anahtar Kelimeler

Kaynakça

Topic Modeling of Twitter Data via RapidMiner

Öz

Anahtar Kelimeler

Kaynakça

Ayrıntılar

Kaynak Göster

Cited By

Unveiling patient-centric interactions in virtual consultation: A comprehensive text mining approach

Health Informatics Journal

https://doi.org/10.1177/14604582251327093

Is There a Difference in the Perception of City in Pre-Pandemic and Peri-Pandemic on Social Media? Case Study from Taiwan

Sage Open

https://doi.org/10.1177/21582440241305609

A Social Media Mining Using Topic Modeling and Sentiment Analysis on Tourism in Malaysia During Covid19

IOP Conference Series: Earth and Environmental Science

https://doi.org/10.1088/1755-1315/704/1/012020

Odun Kompozit Malzemelerle İlgili Şikayetlerin Veri Madenciliği Yöntemleriyle Değerlendirilmesi

Sinop Üniversitesi Fen Bilimleri Dergisi

https://doi.org/10.33484/sinopfbd.938500