Araştırma Makalesi
BibTex RIS Kaynak Göster

Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi

Yıl 2018, Cilt: 24 Sayı: 2, 283 - 291, 30.04.2018

Öz

Terim
ağırlıklandırma, metin sınıflandırmada sonuçlar üzerinde doğrudan etkili olan
önemli bir adımdır. Ancak, bir metin sınıflandırma problemi olarak ele alınan
duygu analizinde farklı önişleme tekniklerine bağlı olarak ağırlıklandırma
yönteminin davranışı değişebilmektedir. Bu çalışmada bilgi getirimi, metin
sınıflandırma, doküman filtreleme gibi farklı çalışma alanları için yakın
zamanda önerilen yöntemler Twitter duygu analizinde uygulanmış ve sonuçlar
üzerindeki etkisi incelenmiştir. Öznitelikler çıkarılırken kelime torbası (BoW)
ve karakter seviye N-gram olmak üzere iki farklı model kullanılmıştır. Deneyler
Türkçe ve İngilizce Twitter mesajlarından oluşan veri kümeleri üzerinde
uygulanmıştır. Twitter mesajlarının duygu sınıflandırması, Gizli Dirichlet Ataması
(LDA) tabanlı konu modeli ile gerçekleştirilmiştir. Sınıflandırma aşamasında
ise Destek Vektör Makinesi (SVM) algoritması kullanılmıştır. Deneysel sonuçlara
göre, Twitter duygu analizi çalışmalarında kullanılabilecek en etkili terim
ağırlıklandırma yöntemi önerilmiştir.

Kaynakça

  • Patra A, Singh D. “A survey report on text classification with different term weighing methods and comparison between classification algorithms”. International Journal of Computer Applications, 75(7), 2013.
  • Prabowo R, Thelwall M. “Sentiment analysis: A combined approach”. Journal of Informetrics, 3(2), 143-157, 2009.
  • Paltoglou G, Thelwall M. “A study of information retrieval weighting schemes for sentiment analysis”. 48th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, USA, 11-16 July 2010.
  • Çetin M, Amasyalı MF. “Supervised and traditional term weighting methods for sentiment analysis”. In Signal Processing and Communications Applications Conference (SIU), Girne, KKTC, 24-26 April 2013.
  • Aizawa A. “An information-theoretic perspective of tf–idf measures”. Information Processing & Management, 39(1), 45-65, 2003.
  • Salton G, Buckley C. “Term-weighting approaches in automatic text retrieval”. Information processing & management, 24(5), 513-523, 1988.
  • Robertson S, Zaragoza H, Taylor M. “Simple BM25 extension to multiple weighted fields”. 13th ACM International Conference on Information and Knowledge Management, New York, USA, 08-13 November 2004.
  • Lan M, Tan CL, Low HB. “Proposing a new term weighting scheme for text categorization”. Association for the Advancement of Artificial Intelligence, Boston, USA, 16-20 June 2006.
  • Reed JW, Jiao Y, Potok TE, Klump BA, Elmore MT, Hurson A R. “TF-ICF: A new term weighting scheme for clustering dynamic data streams”. In ICMLA'06. 5th International Conference on Machine Learning and Applications, Florida, USA, 14-16 December 2006.
  • Polettini N. “The vector space model in information retrieval-term weighting problem”. Entropy, 1-9, 2004.
  • Chen LS, Chang CW. “A new term weighting method by introducing class information for sentiment classification of textual data”. International Multi Conference of Engineers and Computer Scientists, Hong Kong, China, 16-18 March 2011.
  • Deng ZH, Luo KH, Yu HL. “A study of supervised term weighting scheme for sentiment analysis”. Expert Systems with Applications, 41(7), 3506-3513, 2014.
  • Gasanova T, Sergienko R, Akhmedova S, Semenkin E, Minker W. “Opinion mining and topic categorization with novel term Weighting”. 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Baltimore, Maryland, USA, 27 June 2014.
  • Jung Y, Park H, Du D. “A Balanced term-weighting scheme for improved document comparison and classification”. Preprint, 2001.
  • Kansheng SHI, Jie HE, Liu HT, Zhang NT, Song WT. “Efficient text classification method based on improved term reduction and term weighting”. The Journal of China Universities of Posts and Telecommunications, 18(1), 131-135, 2011.
  • Liu Y, Loh H. T, Sun A. “Imbalanced text classification: A term weighting approach”. Expert Systems With Applications, 36(1), 690-701, 2009.
  • Deng ZH, Tang SW, Yang DQ, Li MZLY, Xie KQ. “A comparative study on feature weight in text categorization”. In Advanced Web Technologies and Applications, Hangzhou, China, 14-17 April 2004.
  • Mladenić D, Brank J, Grobelnik M, Milic-Frayling N. “Feature selection using linear classifier weights: interaction with classification models”. 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, 25-29 July 2004.
  • Debole, F, Sebastiani F. Supervised Term Weighting for Automated Text Categorization. Editor(s): Spiros S. Text Mining and its Applications, 81-97, Germany, Berlin Heidelberg, Springer, 2004.
  • Quan X, Wenyin L, Qiu B. “Term weighting schemes for question categorization”. Pattern Analysis and Machine Intelligence, 33(5), 1009-1021, 2011.
  • Go A, Bhayani R, Huang L. “Twitter Sentiment Classification Using Distant Supervision”. Stanford University, California, USA, Project Report, CS224N, 2009.
  • Srividhya V, Anitha R. “Evaluating preprocessing techniques in text categorization”. International Journal of Computer Science and Application, 47(11), 2010.
  • Brücher H, Knolmayer G, Mittermayer MA. “Document classification methods for organizing explicit knowledge”. University of Bern, Switzerland, Technical Report, 140, 2002.
  • Coban O, Ozyer B, Ozyer G. T. “A comparison of similarity metrics for sentiment analysis on Turkish twitter feeds”. International Conference on SocialCom, Chengdu, China, 19-21 December, 2015.
  • Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R. “Sentiment analysis of twitter data”. In Proceedings of the Workshop on Languages in Social Media, Portland, Oregon, USA, 23 June 2011.
  • Kouloumpis E, Wilson T, Moore JD. “Twitter sentiment analysis: The good the bad and the omg!”. International Conference on Web and Social Media, Barcelona, Catalonia, Spain, 17-21 July 2011.
  • Kaya M, Fidan G, Toroslu I. H. “Sentiment analysis of turkish political news”. International Joint Conferences on Web Intelligence and Intelligent Agent Technology, Macau, China, 4-7 December 2012.
  • Walsh, B. “Markov chain monte carlo and gibbs sampling”. University of Sao Paulo, Brazil, Lecture Notes for EBB 581, 2004.
  • Blei DM, Ng AY, Jordan MI. “Latent dirichlet allocation”. The Journal of machine Learning research, 3, 993-1022, 2003.
  • Çoban Ö, Özyer G. T. “Sentiment classification for Turkish twitter feeds using LDA”. 24th IEEE Signal Processing and Communications Applications Conference (SIU), Zonguldak, Turkey, 16-19 May 2016.
  • Salton G, Wong A, Yang CS. “A vector space model for automatic indexing”. Communications of the ACM, 18(11), 613-620, 1975.
  • Lewis DD. “An evaluation of phrasal and clustered representations on a text categorization task”. 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, 21-24 June 1992.
  • Akın AA, Akın MD. “Zemberek, an open source NLP framework for Turkic Languages”. Structure, 10, 1-5, 2007.
  • Porter MF. “An algorithm for suffix stripping”. Program, 14(3), 130-137, 1980.
  • Kanaris I, Kanaris K, Houvardas I, Stamatatos E. “Words versus character n-grams for anti-spam filtering”. International Journal on Artificial Intelligence Tools, 16(06), 1047-1067, 2007.
  • Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C. “Text classification using string kernels”. The Journal of Machine Learning Research, 2, 419-444, 2002.
  • Manning C. D, Raghavan P, Schütze H. Introduction to Information Retrieval. Online Edition, Cambridge, United Kingdom, Cambridge University Press, 2008.
  • Xu H, Li C. “A Novel term weighting scheme for automated text categorization”. 7th International Conference on Intelligent Systems Design Applications, Rio de Janeiro, Brazil, 22-24 October 2007.
  • Nanas N, Uren V, De Roeck A. “A comparative evaluation of term weighting methods for information filtering”. 15th International Workshop on Database and Expert Systems Applications, Zaragoza, Spain, 3-3 September 2004.
  • Bun KK, Ishizuka M. “Topic extraction from news archive using TF*PDF algorithm”. In Proceedings of the Third International Conference on Web Information Systems Engineering, Singapore, 14 December, 2002.
  • De Silva J, Haddela P. S. December. “A term weighting method for identifying emotions from text content”. 2013 International Industrial and Information Systems (ICIIS) Conference, Peradeniya, Sri Lanka, 17-20 December 2013.
  • Liu M, Yang J. “An improvement of TFIDF weighting in text categorization”. International Proceedings of Computer Science and Information Technology, IACSIT Press, Singapore, 2012.
  • Soucy P, Mineau G. W. “Beyond TFIDF weighting for text categorization in the vector space model”. International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, July 30-August 5, 2005.
  • Ren F, Sohrab MG. “Class-indexing-based term weighting for automatic text classification”. Information Sciences, 236, 109-125, 2013.
  • Srividhya V, Anitha R. “Evaluating preprocessing techniques in text categorization”. International Journal of Computer Science and Application, 2010, 49-51, 2010.
  • Cortes C, Vapnik V. “Support-vector networks”. Machine learning, 20(3), 273-297, 1995.
  • Burges C. J. “A tutorial on support vector machines for pattern recognition”. Data mining and knowledge discovery, 2(2), 121-167, 1998.
  • Gunn S. R. “Support Vector Machines for Classification and Regression”. Department of Science and Mathematics Engineering, University of Southampton, Southampton, UK, ISIS Technical Report, 14, 1998.
  • Fradkin D, Muchnik I. “Support vector machines for classification”. Discrete Methods in Epidemiology, 70, 13-20, 2006.
  • Chang CC, Lin CJ. “LIBSVM: A library for support vector machines”. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 2011.
  • Kohavi R. “A study of cross-validation and bootstrap for accuracy estimation and model selection”. International Joint Conference on Artificial Intelligence, Quebec, Canada, 20-25 August 1995.
  • Jones KS, Walker S, Robertson SE. “A probabilistic model of information retrieval: development and comparative experiments”. Information Processing & Management, 36(6), 809-840, 2000.
  • Sheela LJ. “A Review of Sentiment Analysis in Twitter Data Using Hadoop”. International Journal of Database Theory and Application, 9(1), 77-86, 2016.

The impact of term weighting method on Twitter sentiment analysis

Yıl 2018, Cilt: 24 Sayı: 2, 283 - 291, 30.04.2018

Öz

Term
weighting is an important step which has direct impact on the result in
classical text classification. However, the behavior of the term weighting
method may vary depending on different preprocessing techniques in sentiment
analysis which considered as a text classification task. In this study, term
weighted methods which are newly proposed for various research areas such as
information retrieval, text classification and document filtering, performed to
investigate effect on results for Twitter sentiment analysis. In feature
extraction phase, two different models are used including Bag of Words (BoW)
and character level N-gram. The experiments conducted on data sets consist of
Turkish and English Twitter feeds. Sentiment classification of Twitter feeds
performed using topic model generated with Latent Dirichlet Allocation (LDA)
method. The Support Vector Machine (SVM) algorithm is employed in the
classification stage. According to the experimental results, the most effective
term weighting method that can be used in Twitter sentiment analysis studies is
suggested.

Kaynakça

  • Patra A, Singh D. “A survey report on text classification with different term weighing methods and comparison between classification algorithms”. International Journal of Computer Applications, 75(7), 2013.
  • Prabowo R, Thelwall M. “Sentiment analysis: A combined approach”. Journal of Informetrics, 3(2), 143-157, 2009.
  • Paltoglou G, Thelwall M. “A study of information retrieval weighting schemes for sentiment analysis”. 48th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, USA, 11-16 July 2010.
  • Çetin M, Amasyalı MF. “Supervised and traditional term weighting methods for sentiment analysis”. In Signal Processing and Communications Applications Conference (SIU), Girne, KKTC, 24-26 April 2013.
  • Aizawa A. “An information-theoretic perspective of tf–idf measures”. Information Processing & Management, 39(1), 45-65, 2003.
  • Salton G, Buckley C. “Term-weighting approaches in automatic text retrieval”. Information processing & management, 24(5), 513-523, 1988.
  • Robertson S, Zaragoza H, Taylor M. “Simple BM25 extension to multiple weighted fields”. 13th ACM International Conference on Information and Knowledge Management, New York, USA, 08-13 November 2004.
  • Lan M, Tan CL, Low HB. “Proposing a new term weighting scheme for text categorization”. Association for the Advancement of Artificial Intelligence, Boston, USA, 16-20 June 2006.
  • Reed JW, Jiao Y, Potok TE, Klump BA, Elmore MT, Hurson A R. “TF-ICF: A new term weighting scheme for clustering dynamic data streams”. In ICMLA'06. 5th International Conference on Machine Learning and Applications, Florida, USA, 14-16 December 2006.
  • Polettini N. “The vector space model in information retrieval-term weighting problem”. Entropy, 1-9, 2004.
  • Chen LS, Chang CW. “A new term weighting method by introducing class information for sentiment classification of textual data”. International Multi Conference of Engineers and Computer Scientists, Hong Kong, China, 16-18 March 2011.
  • Deng ZH, Luo KH, Yu HL. “A study of supervised term weighting scheme for sentiment analysis”. Expert Systems with Applications, 41(7), 3506-3513, 2014.
  • Gasanova T, Sergienko R, Akhmedova S, Semenkin E, Minker W. “Opinion mining and topic categorization with novel term Weighting”. 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Baltimore, Maryland, USA, 27 June 2014.
  • Jung Y, Park H, Du D. “A Balanced term-weighting scheme for improved document comparison and classification”. Preprint, 2001.
  • Kansheng SHI, Jie HE, Liu HT, Zhang NT, Song WT. “Efficient text classification method based on improved term reduction and term weighting”. The Journal of China Universities of Posts and Telecommunications, 18(1), 131-135, 2011.
  • Liu Y, Loh H. T, Sun A. “Imbalanced text classification: A term weighting approach”. Expert Systems With Applications, 36(1), 690-701, 2009.
  • Deng ZH, Tang SW, Yang DQ, Li MZLY, Xie KQ. “A comparative study on feature weight in text categorization”. In Advanced Web Technologies and Applications, Hangzhou, China, 14-17 April 2004.
  • Mladenić D, Brank J, Grobelnik M, Milic-Frayling N. “Feature selection using linear classifier weights: interaction with classification models”. 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, 25-29 July 2004.
  • Debole, F, Sebastiani F. Supervised Term Weighting for Automated Text Categorization. Editor(s): Spiros S. Text Mining and its Applications, 81-97, Germany, Berlin Heidelberg, Springer, 2004.
  • Quan X, Wenyin L, Qiu B. “Term weighting schemes for question categorization”. Pattern Analysis and Machine Intelligence, 33(5), 1009-1021, 2011.
  • Go A, Bhayani R, Huang L. “Twitter Sentiment Classification Using Distant Supervision”. Stanford University, California, USA, Project Report, CS224N, 2009.
  • Srividhya V, Anitha R. “Evaluating preprocessing techniques in text categorization”. International Journal of Computer Science and Application, 47(11), 2010.
  • Brücher H, Knolmayer G, Mittermayer MA. “Document classification methods for organizing explicit knowledge”. University of Bern, Switzerland, Technical Report, 140, 2002.
  • Coban O, Ozyer B, Ozyer G. T. “A comparison of similarity metrics for sentiment analysis on Turkish twitter feeds”. International Conference on SocialCom, Chengdu, China, 19-21 December, 2015.
  • Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R. “Sentiment analysis of twitter data”. In Proceedings of the Workshop on Languages in Social Media, Portland, Oregon, USA, 23 June 2011.
  • Kouloumpis E, Wilson T, Moore JD. “Twitter sentiment analysis: The good the bad and the omg!”. International Conference on Web and Social Media, Barcelona, Catalonia, Spain, 17-21 July 2011.
  • Kaya M, Fidan G, Toroslu I. H. “Sentiment analysis of turkish political news”. International Joint Conferences on Web Intelligence and Intelligent Agent Technology, Macau, China, 4-7 December 2012.
  • Walsh, B. “Markov chain monte carlo and gibbs sampling”. University of Sao Paulo, Brazil, Lecture Notes for EBB 581, 2004.
  • Blei DM, Ng AY, Jordan MI. “Latent dirichlet allocation”. The Journal of machine Learning research, 3, 993-1022, 2003.
  • Çoban Ö, Özyer G. T. “Sentiment classification for Turkish twitter feeds using LDA”. 24th IEEE Signal Processing and Communications Applications Conference (SIU), Zonguldak, Turkey, 16-19 May 2016.
  • Salton G, Wong A, Yang CS. “A vector space model for automatic indexing”. Communications of the ACM, 18(11), 613-620, 1975.
  • Lewis DD. “An evaluation of phrasal and clustered representations on a text categorization task”. 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, 21-24 June 1992.
  • Akın AA, Akın MD. “Zemberek, an open source NLP framework for Turkic Languages”. Structure, 10, 1-5, 2007.
  • Porter MF. “An algorithm for suffix stripping”. Program, 14(3), 130-137, 1980.
  • Kanaris I, Kanaris K, Houvardas I, Stamatatos E. “Words versus character n-grams for anti-spam filtering”. International Journal on Artificial Intelligence Tools, 16(06), 1047-1067, 2007.
  • Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C. “Text classification using string kernels”. The Journal of Machine Learning Research, 2, 419-444, 2002.
  • Manning C. D, Raghavan P, Schütze H. Introduction to Information Retrieval. Online Edition, Cambridge, United Kingdom, Cambridge University Press, 2008.
  • Xu H, Li C. “A Novel term weighting scheme for automated text categorization”. 7th International Conference on Intelligent Systems Design Applications, Rio de Janeiro, Brazil, 22-24 October 2007.
  • Nanas N, Uren V, De Roeck A. “A comparative evaluation of term weighting methods for information filtering”. 15th International Workshop on Database and Expert Systems Applications, Zaragoza, Spain, 3-3 September 2004.
  • Bun KK, Ishizuka M. “Topic extraction from news archive using TF*PDF algorithm”. In Proceedings of the Third International Conference on Web Information Systems Engineering, Singapore, 14 December, 2002.
  • De Silva J, Haddela P. S. December. “A term weighting method for identifying emotions from text content”. 2013 International Industrial and Information Systems (ICIIS) Conference, Peradeniya, Sri Lanka, 17-20 December 2013.
  • Liu M, Yang J. “An improvement of TFIDF weighting in text categorization”. International Proceedings of Computer Science and Information Technology, IACSIT Press, Singapore, 2012.
  • Soucy P, Mineau G. W. “Beyond TFIDF weighting for text categorization in the vector space model”. International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, July 30-August 5, 2005.
  • Ren F, Sohrab MG. “Class-indexing-based term weighting for automatic text classification”. Information Sciences, 236, 109-125, 2013.
  • Srividhya V, Anitha R. “Evaluating preprocessing techniques in text categorization”. International Journal of Computer Science and Application, 2010, 49-51, 2010.
  • Cortes C, Vapnik V. “Support-vector networks”. Machine learning, 20(3), 273-297, 1995.
  • Burges C. J. “A tutorial on support vector machines for pattern recognition”. Data mining and knowledge discovery, 2(2), 121-167, 1998.
  • Gunn S. R. “Support Vector Machines for Classification and Regression”. Department of Science and Mathematics Engineering, University of Southampton, Southampton, UK, ISIS Technical Report, 14, 1998.
  • Fradkin D, Muchnik I. “Support vector machines for classification”. Discrete Methods in Epidemiology, 70, 13-20, 2006.
  • Chang CC, Lin CJ. “LIBSVM: A library for support vector machines”. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 2011.
  • Kohavi R. “A study of cross-validation and bootstrap for accuracy estimation and model selection”. International Joint Conference on Artificial Intelligence, Quebec, Canada, 20-25 August 1995.
  • Jones KS, Walker S, Robertson SE. “A probabilistic model of information retrieval: development and comparative experiments”. Information Processing & Management, 36(6), 809-840, 2000.
  • Sheela LJ. “A Review of Sentiment Analysis in Twitter Data Using Hadoop”. International Journal of Database Theory and Application, 9(1), 77-86, 2016.
Toplam 53 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Mühendislik
Bölüm Makale
Yazarlar

Önder Çoban 0000-0001-9404-2583

Gülşah Tümüklü Özyer Bu kişi benim 0000-0002-0596-0065

Yayımlanma Tarihi 30 Nisan 2018
Yayımlandığı Sayı Yıl 2018 Cilt: 24 Sayı: 2

Kaynak Göster

APA Çoban, Ö., & Tümüklü Özyer, G. (2018). Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(2), 283-291.
AMA Çoban Ö, Tümüklü Özyer G. Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. Nisan 2018;24(2):283-291.
Chicago Çoban, Önder, ve Gülşah Tümüklü Özyer. “Twitter Duygu Analizinde Terim ağırlıklandırma yönteminin Etkisi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24, sy. 2 (Nisan 2018): 283-91.
EndNote Çoban Ö, Tümüklü Özyer G (01 Nisan 2018) Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24 2 283–291.
IEEE Ö. Çoban ve G. Tümüklü Özyer, “Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 24, sy. 2, ss. 283–291, 2018.
ISNAD Çoban, Önder - Tümüklü Özyer, Gülşah. “Twitter Duygu Analizinde Terim ağırlıklandırma yönteminin Etkisi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24/2 (Nisan 2018), 283-291.
JAMA Çoban Ö, Tümüklü Özyer G. Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2018;24:283–291.
MLA Çoban, Önder ve Gülşah Tümüklü Özyer. “Twitter Duygu Analizinde Terim ağırlıklandırma yönteminin Etkisi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 24, sy. 2, 2018, ss. 283-91.
Vancouver Çoban Ö, Tümüklü Özyer G. Twitter duygu analizinde terim ağırlıklandırma yönteminin etkisi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2018;24(2):283-91.





Creative Commons Lisansı
Bu dergi Creative Commons Al 4.0 Uluslararası Lisansı ile lisanslanmıştır.