Finding Influencers on Twitter with Using Machine Learning Classification Algorithms

Mehmet Şimşek; Abdullah Talha Kabakuş

EN TR

Finding Influencers on Twitter with Using Machine Learning Classification Algorithms

Abstract

Microblog sites are environments where people follow people. With this feature, a microblog site is a convenient environment for spreading an opinion or introducing a new product. The key point is determination of individuals who maximize the spreading. This problem is known as Influence Maximization (IM) and has attracted attention of many researchers. Many studies in the literature have modeled IM problem on graphs for different propagation models such as Independent Cascade (IC) and Linear Threshold (LT). However, microblogs like Twitter have their own features. Many works on IM in Twitter derive new metrics from user and tweet features; apply a greedy approach for selecting influencers. In this study, we adopted different approach for IM problem, and we dealt it as a classification problem. Firstly, we collected data on International Women Day 2018; empirically we labeled the users as either influencer candidates or non-influencers; then we applied classification methods for classifying users into one class with using features of users. By this way, we obtained an influencer candidates set, which is very smaller than entire dataset. Experimental results show that making selection with using same heuristic (namely MF) from the reduced influencer candidates set outperforms making selection from entire dataset.

Keywords

Influence Maximization,Twitter,Social Networks,Microblog,Classification

Twitter Üzerindeki Etkili Bireylerin Makine Öğrenmesi Sınıflandırma Algoritmaları İle Tespiti

Abstract

Mikroblog siteleri insanların birbirlerini takip ettikleri ortamlardır. Bu özellikleri ile bir microblog sitesi bir fikrin ya da yeni bir ürünün yayılması için elverişli bir ortamdır. Buradaki anahtar nokta, yayılımı maksimize edecek bireylerin tespitidir. Bu problem, Etki Maksimizasyonu (EM) olarak bilinir ve birçok araştırmacının ilgisini çekmiştir. Literatürdeki birçok çalışma EM problemini graflar üzerinde Independent Cascade (IC) ve Linear Threshold (LT) yayılım modelleri için ele almıştır. Ne var ki, Twitter gibi microblog sitelerinin kendi özellikleri ve vardır. Twitter üzerinde EM problemini ele almış olan birçok çalışma, kullanıcı ve tweet özelliklerinden yeni ölçütler geliştirme ve bu ölçütleri kullanan bir aç gözlü algoritma ile etkin bireyleri seçme yolunu izler. Bu çalışmada biz EM problemi farklı bir yaklaşım uyguladık ve problemi bir sınıflandırma problemi olarak ele aldık. İlk olarak, 2018 Uluslararası Kadınlar Gününde veri topladık; kullanıcıları deneysel olarak etkili bireyler ve etkili olmayan bireyler olarak etiketledik; son olarak bireyleri etkili ya da etkili olmayan diye sınıflara ayırmak için sınıflandırma algoritmalarını kullandık. Bu şekilde, ana verisetinden oldukça küçük olan bir etkili bireyler kümesi elde ettik. Deneysel sonuçlar, aynı parametreyi kullanarak indirgenmiş kümeden seçim yapılmasının, bütün veriseti üzerinden seçim yapılmasına göre çok daha başarılı sonuçlar verdiğini göstermiştir.

Keywords

Etki Maksimizasyonu,Twitter,Sosyal Ağlar,Mikroblog,Sınıflandırma

References

[1] M. Cha, H. Haddai, F. Benevenuto, and K. P. Gummadi, “Measuring User Influence in Twitter: The Million Follower Fallacy,” in International AAAI Conference on Weblogs and Social Media 2010 (ICWSM-10), 2010, pp. 10–17.
[2] L. Cui et al., “DDSE: A novel evolutionary algorithm based on degree-descending search strategy for influence maximization in social networks,” J. Netw. Comput. Appl., vol. 103, no. September 2017, pp. 119–130, 2018.
[3] P. Domingos and M. Richardson, “Mining the network value of customers,” in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’01, 2001, pp. 57–66.
[4] D. Kempe, J. Kleinberg, and É. Tardos, “Maximizing the spread of influence through a social network,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’03, 2003, p. 137.
[5] D. Li, C. Wang, S. Zhang, G. Zhou, D. Chu, and C. Wu, “Positive influence maximization in signed social networks based on simulated annealing,” Neurocomputing, vol. 260, pp. 69–78, 2017.
[6] L. Liu, J. Tang, J. Han, and S. Yang, “Learning influence from heterogeneous social networks,” Data Min. Knowl. Discov., vol. 25, no. 3, pp. 511–544, 2012.
[7] J. S. More and C. Lingam, “A SI model for social media influencer maximization,” Appl. Comput. Informatics, 2017.
[8] M. Richardson and P. Domingos, “Mining knowledge-sharing sites for viral marketing,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’02, 2002, p. 61.

[9] Y. Zeng, X. Chen, G. Cong, S. Qin, J. Tang, and Y. Xiang, “Maximizing influence under influence loss constraint in social networks,” Expert Syst. Appl., vol. 55, pp. 255–267, 2016.
[10] F. Li and T. C. Du, “Maximizing micro-blog influence in online promotion,” Expert Syst. Appl., vol. 70, pp. 52–66, 2017.
[11] S. P. Borgatti, “Identifying sets of key players in a social network,” Comput. Math. Organ. Theory, vol. 12, no. 1, pp. 21–34, Apr. 2006.
[12] W. Chen, Y. Wang, and S. Yang, “Efficient influence maximization in social networks,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09, 2009, p. 199.
[13] T. Lappas, E. Terzi, D. Gunopulos, and H. Mannila, “Finding effectors in social networks,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’10, 2010, p. 1059.
[14] J.-R. Lee and C.-W. Chung, “A Query Approach for Influence Maximization on Specific Users in Social Networks,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 2, pp. 340–353, Feb. 2015.
[15] D. Li, Z.-M. Xu, N. Chakraborty, A. Gupta, K. Sycara, and S. Li, “Polarity Related Influence Maximization in Signed Social Networks,” PLoS One, vol. 9, no. 7, p. e102199, Jul. 2014.
[16] S. Peng, A. Yang, L. Cao, S. Yu, and D. Xie, “Social influence modeling using information theory in mobile social networks,” Inf. Sci. (Ny)., vol. 379, pp. 146–159, Feb. 2017.
[17] K. Zhang, H. Du, and M. W. Feldman, “Maximizing influence in a social network: Improved results using a genetic algorithm,” Phys. A Stat. Mech. its Appl., vol. 478, pp. 20–30, Jul. 2017.
[18] S. Peng, Y. Zhou, L. Cao, S. Yu, J. Niu, and W. Jia, “Influence analysis in social networks: A survey,” J. Netw. Comput. Appl., vol. 106, no. November 2017, pp. 17–32, 2018.
[19] M. Samadi, R. Nagi, A. Semenov, and A. Nikolaev, “Seed activation scheduling for influence maximization in social networks,” Omega, vol. 77, no. June 2018, pp. 96–114, 2018.
[20] J. V. Cossu, V. Labatut, and N. Dugué, “A review of features for the discrimination of twitter users: application to the prediction of offline influence,” Soc. Netw. Anal. Min., vol. 6, no. 1, 2016.
[21] Z. Zengin Alp and S. Gunduz Oguducu, “Identifying topical influencers on twitter based on user behavior and network topology,” Knowledge-Based Syst., vol. 141, pp. 211–221, 2018.
[22] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software,” ACM SIGKDD Explor. Newsl., vol. 11, no. 1, p. 10, Nov. 2009.
[23] M. Kitsak et al., “Identification of influential spreaders in complex networks,” Nat. Phys., vol. 6, no. 11, pp. 888–893, Nov. 2010.
[24] D. M. Romero, W. Galuba, S. Asur, and B. A. Huberman, “Influence and passivity in social media,” in Proceedings of the 20th international conference companion on World wide web - WWW ’11, 2011, p. 113.
[25] M. Cataldi and M. A. Aufaure, “The 10 million follower fallacy: audience size does not prove domain-influence on Twitter,” Knowl. Inf. Syst., vol. 44, no. 3, pp. 559–580, 2015.
[26] A. Pal and S. Counts, “Identifying topical authorities in microblogs,” Proc. fourth ACM Int. Conf. Web search data Min. - WSDM ’11, p. 45, 2011.
[27] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?,” in Proceedings of the 19th International Conference on World Wide Web (WWW’10), 2010, vol. 112, no. 2, pp. 591–600.
[28] J. Weng, E. P. Lim, J. Jiang, and Q. He, “Twitterrank: Finding topic-sensitive influential twitterers,” Proc. 3rd ACM Int. Conf. Web Search Data Min. (WSDM 2010), pp. 261–270, 2010.
[29] A. Leavitt, E. Burchard, D. Fisher, and S. Gilbert, “The Influentials: New Approaches for Analyzing Influence on Twitter,” 2009. [Online]. Available: http://www.webecologyproject.org/wp-content/uploads/2009/09/influence-report-final.pdf. [Accessed: 05-Apr-2018].
[30] P. Ficamos and Y. Liu, “A Topic based Approach for Sentiment Analysis on Twitter Data.,” Int. J. Adv. Comput. Sci. Appl., vol. 7, no. 12, pp. 201–205, 2016.
[31] P.-C. Lin and P.-M. Huang, “A Study of Effective Features for Detecting Long-surviving Twitter Spam Accounts,” in 2013 15th International Conference on Advanced Communications Technology (ICACT), 2013, pp. 841–846.
[32] M. Giatsoglou, M. G. Vozalis, K. Diamantaras, A. Vakali, G. Sarigiannidis, and K. C. Chatzisavvas, “Sentiment analysis leveraging emotions and word embeddings,” Expert Syst. Appl., vol. 69, pp. 214–224, 2017.
[33] “Rate limits — Twitter Developers,” Twitter, 2018. [Online]. Available: https://developer.twitter.com/en/docs/basics/rate-limits.html. [Accessed: 05-Apr-2018].
[34] S. Boughorbel, F. Jarray, and M. El-Anbari, “Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric,” PLoS One, vol. 12, no. 6, p. e0177678, Jun. 2017.
[35] A. More, “Survey of resampling techniques for improving classification performance in unbalanced datasets,” vol. 10000, pp. 1–7, 2016.
[36] O. Arbelaitz, I. Gurrutxaga, J. Muguerza, and J. M. Pérez, “Applying Resampling Methods for Imbalanced Datasets to Not So Imbalanced Datasets,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8109 LNAI, 2013, pp. 111–120.
[37] G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data streams,” in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’01, 2001, pp. 97–106.
[38] D. W. Aha, D. Kibler, and M. K. Albert, “Instance-based learning algorithms,” Mach. Learn., vol. 6, no. 1, pp. 37–66, Jan. 1991.
[39] S. L. Salzberg, “C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993,” Mach. Learn., vol. 16, no. 3, pp. 235–240, Sep. 1994.
[40] J. G. Cleary and L. E. Trigg, “K*: An Instance-based Learner Using and Entropic Distance Measure,” in Proceedings of the Twelfth International Conference on International Conference on Machine Learning, 1995, pp. 108–114.
[41] C. G. Atkeson, A. W. Moore, and S. Schaal, “Locally Weighted Learning,” Artif. Intell. Rev., vol. 11, no. 1–5, pp. 11–73, 1997.
[42] G. H. John and P. Langley, “Estimating Continuous Distributions in Bayesian Classifiers,” in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, 1995, pp. 338–345.
[43] R. Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-tree Hybrid,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 202–207.
[44] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
[45] B. W. Matthews, “Comparison of the predicted and observed secondary structure of T4 phage lysozyme,” Biochim. Biophys. Acta - Protein Struct., vol. 405, no. 2, pp. 442–451, Oct. 1975.
[46] Y. Yang, S. Hung, Y. Zhang, and Y. Shen, “INDIVIDUAL-INFLUENCE & INTERACTIVE-RELATIONSHIP : THE APPLICATION UPON SOCIAL MEDIA FOR PUBLISHING-RELATED,” in Forty-Sixth Annual Meeting of the Western Decision Sciences Institute (WDSI 2017), 2017, pp. 1–6.
[47] D. Bartholomew, “SQL vs. NoSQL,” Linux J., vol. 2010, no. 195, pp. 54–59, 2010.
[48] Y. Li and S. Manoharan, “A performance comparison of SQL and NoSQL databases,” in IEEE Pacific RIM Conference on Communications, Computers, and Signal Processing - Proceedings, 2013, pp. 15–19.
[49] Jing Han, Haihong E, Guan Le, and Jian Du, “Survey on NoSQL database,” in 2011 6th International Conference on Pervasive Computing and Applications, 2011, pp. 363–366.
[50] A. Boicea, F. Radulescu, and L. I. Agapin, “MongoDB vs Oracle - Database comparison,” in Proceedings of 3rd International Conference on Emerging Intelligent Data and Web Technologies, EIDWT 2012, 2012, pp. 330–335.

Details

Primary Language

English

Subjects

Computer Software

Journal Section

Research Article

Authors

Mehmet Şimşek
0000-0002-9797-5028
Türkiye

Abdullah Talha Kabakuş This is me
0000-0003-2181-4292
Türkiye

Publication Date

December 24, 2018

Submission Date

October 8, 2018

Acceptance Date

November 6, 2018

Published in Issue

Year 2018 Volume: 4 Number: 3

IZ

https://izlik.org/JA49YS26TA

Cite

RIS / Bibtex

APA

Şimşek, M., & Kabakuş, A. T. (2018). Finding Influencers on Twitter with Using Machine Learning Classification Algorithms. Gazi Journal of Engineering Sciences, 4(3), 183-196. https://izlik.org/JA49YS26TA

AMA

1.Şimşek M, Kabakuş AT. Finding Influencers on Twitter with Using Machine Learning Classification Algorithms. GJES. 2018;4(3):183-196. https://izlik.org/JA49YS26TA

Chicago

Şimşek, Mehmet, and Abdullah Talha Kabakuş. 2018. “Finding Influencers on Twitter With Using Machine Learning Classification Algorithms”. Gazi Journal of Engineering Sciences 4 (3): 183-96. https://izlik.org/JA49YS26TA.

EndNote

Şimşek M, Kabakuş AT (December 1, 2018) Finding Influencers on Twitter with Using Machine Learning Classification Algorithms. Gazi Journal of Engineering Sciences 4 3 183–196.

IEEE

[1]M. Şimşek and A. T. Kabakuş, “Finding Influencers on Twitter with Using Machine Learning Classification Algorithms”, GJES, vol. 4, no. 3, pp. 183–196, Dec. 2018, [Online]. Available: https://izlik.org/JA49YS26TA

ISNAD

Şimşek, Mehmet - Kabakuş, Abdullah Talha. “Finding Influencers on Twitter With Using Machine Learning Classification Algorithms”. Gazi Journal of Engineering Sciences 4/3 (December 1, 2018): 183-196. https://izlik.org/JA49YS26TA.

JAMA

1.Şimşek M, Kabakuş AT. Finding Influencers on Twitter with Using Machine Learning Classification Algorithms. GJES. 2018;4:183–196.

MLA

Şimşek, Mehmet, and Abdullah Talha Kabakuş. “Finding Influencers on Twitter With Using Machine Learning Classification Algorithms”. Gazi Journal of Engineering Sciences, vol. 4, no. 3, Dec. 2018, pp. 183-96, https://izlik.org/JA49YS26TA.

Vancouver

1.Mehmet Şimşek, Abdullah Talha Kabakuş. Finding Influencers on Twitter with Using Machine Learning Classification Algorithms. GJES [Internet]. 2018 Dec. 1;4(3):183-96. Available from: https://izlik.org/JA49YS26TA