TY - JOUR T1 - Machine Learning Approach for Emotion Identification and Classification in Bitcoin Sentiment Analysis TT - Bitcoin Duygu Analizinde Duygu Tanıma ve Sınıflandırma için Makine Öğrenmesi Yaklaşımı AU - Kına, Erol AU - Biçek, Emre PY - 2024 DA - December Y2 - 2024 DO - 10.53433/yyufbed.1532649 JF - Yüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi JO - YYU JINAS PB - Van Yuzuncu Yıl University WT - DergiPark SN - 1300-5413 SP - 913 EP - 926 VL - 29 IS - 3 LA - en AB - Bitcoin is the most valuable cryptocurrency and is renowned for its rapid and volatile price fluctuations in comparison to other currencies. This offers potential for the prediction of Bitcoin prices and has attracted the interest of researchers. Twitter (X) is one of the most widely used social media platforms. The aim of this study is to analyse the sentiment expressed in comments about bitcoin on the social media platform X using a variety of machine learning algorithms. A variety of machine learning techniques are used to classify user sentiment towards bitcoin. Moreover, the efficacy of standard bag-of-words and term frequency-inverse document frequency (TF-IDF) methods is evaluated in comparison with machine learning approaches for the purpose of expressing text as numerical vectors. Finally, a keyword ranking was performed to determine the importance of each sentiment in the development of cryptocurrencies. The bag-of-words and TF-IDF methods were used, which facilitate the representation of text-based data. The best result was obtained with the decision trees algorithm (98.74% accuracy) using the TF-IDF method. The bag-of-words method was found to produce better results in general. KW - Algorithms KW - Bitcoin KW - Machine learning KW - NLP KW - Sentiment analysis N2 - Bitcoin en yüksek piyasa değerine sahip kripto para birimidir ve diğer para birimlerine kıyasla hızlı ve değişken fiyat dalgalanmalarıyla bilinir. Bu durum Bitcoin’in fiyat tahmini için fırsatlar sunmakta ve araştırmacıların ilgisini çekmektedir. Twitter (X), en yaygın kullanılan sosyal medya platformlarından biridir. Bu çalışma kapsamında, makine öğrenimi algoritmalarını kullanarak Bitcoin ile ilgili X yorumlarının duyarlılığı analiz edilmiştir. Bitcoin'e yönelik kullanıcı duyarlılığını sınıflandırmak için spesifik makine öğrenimi teknikleri kullanılmış ve metni sayısal vektörler olarak ifade etmek için standart kelime torbası ve terim frekansı-ters belge frekansı (TF-IDF) yöntemleri makine öğrenimi yaklaşımlarıyla karşılaştırılmıştır. Son olarak, kripto para birimlerinin gelişiminde her duygunun önemini belirlemek için anahtar kelime sıralaması yapılarak, metin tabanlı verilerin temsilini kolaylaştıran Bag-of-words ve TF-IDF yöntemleri kullanılmıştır. En iyi sonuç TF-IDF yöntemi kullanılarak karar ağaçları algoritmasıyla (%98.74 doğruluk) elde edilmiş, çalışmada Bag-of-words yönteminin genel olarak daha iyi sonuçlar ürettiği görülmüştür. CR - Akhtar, Md. S., Gupta, D., Ekbal, A., & Bhattacharyya, P. (2017). Feature selection and ensemble construction: A two-step method for aspect-based sentiment analysis. Knowledge-Based Systems, 125, 116–135. https://doi.org/10.1016/j.knosys.2017.03.020 CR - Alasmari, S. F., & Dahab, M. (2017). Sentiment detection, recognition and aspect identification. International Journal of Computer Applications, 177(2), 31-38. https://doi.org/10.5120/ijca2017915675 CR - Alnaied, A., Elbendak, M., & Bulbul, A. (2020). An intelligent use of stemmer and morphology analysis for Arabic information retrieval. Egyptian Informatics Journal, 21(4), 209–217. https://doi.org/10.1016/j.eij.2020.02.004 CR - Andhale, S., Mane, P., Vaingankar, D. C., Karia, K., & Talele, K. (2021). Twitter sentiment analysis for COVID-19. In 2021 International Conference on Communication Information and Computing Technology (ICCICT) (pp. 1-12), Mumbai, India. https://doi.org/10.1109/iccict50803.2021.9509933 CR - Anonymous. (2023a). Bitcoin sentiment analysis | Twitter data. Kaggle. Access date: July 23, 2023. https://www.kaggle.com/datasets/gautamchettiar/bitcoin-sentiment-analysis-twitter-data CR - Anonymous. (2023b). Scikit-learn/scikit-learn: Scikit-learn 0.22.1. Access date: July 23, 2023. https://doi.org/10.5281/zenodo.3596890 CR - Avcı, İ., & Koca, M. (2023). Predicting DDoS attacks using machine learning algorithms in building management systems. Electronics, 12(19), 4142. doi: https://doi.org/10.3390/ELECTRONICS12194142 CR - Barros, D. P., Moura, J., Freire, C. R., Taleb, A. C., De Medeiros Valentim, R. A., & De Morais, P. S. G. (2020). Machine learning applied to retinal image processing for glaucoma detection: Review and perspective. Biomedical Engineering Online, 19(1), 20. https://doi.org/10.1186/s12938-020-00767-2 CR - Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/a:1010933404324 CR - Bulu, B., Yağar, F., Kopmaz, B., Şişman Kitapçı, N., Kitapçı, O., Aksu Kılıç, P., Köksal, L., & Mumcu, G. (2019). The content of Twitter messages of different health groups: The role of social media in health. International Journal of Health Management and Tourism, 4(3), 228–236. https://doi.org/10.31201/ijhmt.644197 CR - Cambria, E., Olsher, D., & Rajagopal, D. (2014). SENTICNeT 3: A common and common-sense knowledge base for cognition-driven sentiment analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1). https://doi.org/10.1609/aaai.v28i1.8928 CR - Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785 CR - Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1). https://doi.org/10.1186/s12864-019-6413-7 CR - Dass, S., Kannan, V. K., & Shyamala, K. (2020). Sentiment severity on location-based social network (LBSN) data of natural disasters. International Journal of Recent Technology and Engineering, 8(5), 6–12. https://doi.org/10.35940/ijrte.e6631.018520 CR - Domingos, P., & Pazzani, M. (2017). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103-130. https://rdcu.be/dgWb2 CR - Dutta, A., Kumar, S., & Basu, M. (2020). A Gated Recurrent Unit approach to Bitcoin price prediction. Journal of Risk and Financial Management, 13(2), 23. https://doi.org/10.3390/jrfm13020023 CR - Elbagir, S., & Yang, J. (2019). Twitter sentiment analysis using natural language toolkit and VADER sentiment. In Proceedings of the International Multiconference of Engineers and Computer Scientists (Vol. 122, p. 16) CR - Fakieh, B., Al-Ghamdi, A. S. A.-M., Saleem, F., & Ragab, M. (2023). Optimal machine learning driven sentiment analysis on COVID-19 Twitter data. Computers, Materials & Continua, 75(1), 81–97. https://doi.org/10.32604/cmc.2023.033406 CR - Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82–89. https://doi.org/10.1145/2436256.2436274 CR - Georgoula, I., Pournarakis, D., Bilanakos, C., Sotiropoulos, D. N., & Giaglis, G. M. (2015). Using time-series and sentiment analysis to detect the determinants of Bitcoin prices. Social Science Research Network. https://doi.org/10.2139/ssrn.2607167 CR - Gozbasi, O. (2021, July 12). Is Bitcoin a safe haven? A study on the factors that affect Bitcoin prices. International Journal of Economics and Financial Issues, 11(4), 35-40. https://econjournals.com/index.php/ijefi/article/view/11602 CR - Greaves, F., Ramirez-Cano, D., Millett, C., Darzi, A., & Donaldson, L. (2013). Use of sentiment analysis for capturing patient experience from free-text comments posted online. Journal of Medical Internet Research, 15(11), e239. https://doi.org/10.2196/jmir.2721 CR - Hâkim, A., Erwin, A., Eng, K., Galinium, M., & Muliady, W. (2014). Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach. In 6th International Conference on Information Technology and Electrical Engineering (ICITEE) (pp. 1–4). CR - Hasan, K. M. A., Shovon, S. D., Joy, N. H., & Islam, S. (2021). Automatic labeling of Twitter data for developing COVID-19 sentiment dataset. In 2021 5th International Conference on Electrical Information and Communication Technology (EICT) (pp. 1-6). https://doi.org/10.1109/eict54103.2021.9733548 CR - Ibrahim, A. (2021). Forecasting the early market movement in Bitcoin using Twitter’s sentiment analysis: An ensemble-based prediction model. In 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS) (pp. 1-5). https://doi.org/10.1109/iemtronics52119.2021.9422647 CR - Joachims, T. (1999). Svmlight: Support vector machine. SVM-Light Support Vector Machine, University of Dortmund, 19(4), 25. http://svmlight.joachims.org/ CR - Kinderis, M., Bezbradica, M., & Crane, M. (2018). Bitcoin currency fluctuation. In Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (pp. 31-41). https://doi.org/10.5220/0006794000310041 CR - Kleinbaum, D. G., & Klein, M. (2002). Analysis of matched data using logistic regression. In Logistic Regression (pp. 227–265). Springer eBooks. https://doi.org/10.1007/0-387-21647-2_8 CR - Kranjc, J., Smailović, J., Podpečan, V., Grčar, M., Žnidaršič, M., & Lavrač, N. (2015). Active learning for sentiment analysis on data streams: Methodology and workflow implementation in the ClowdFlows platform. Information Processing and Management, 51(2), 187–203. https://doi.org/10.1016/j.ipm.2014.04.001 CR - Larkey, L. S., Ballesteros, L., & Connell, M. E. (2007). Light stemming for Arabic information retrieval. In: Soudi, A., Bosch, A.v., Neumann, G. (eds) Arabic computational morphology. Text, speech and language technology, vol 38. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_12 CR - LaValley, M. P. (2008). Logistic regression. Circulation, 117(18), 2395–2399. https://doi.org/10.1161/circulationaha.106.682658 CR - Li, T., Chamrajnagar, A. S., Fong, X. R., Rizik, N. R., & Fu, F. (2019). Sentiment-based prediction of alternative cryptocurrency price fluctuations using gradient boosting tree model. Frontiers in Physics, 7. https://doi.org/10.3389/fphy.2019.00098 CR - Loria, S. (2018). Textblob Documentation (Release 0.15, 2[8], 269) CR - McGrath, M. (2023). Python in easy steps.Access date: 23.07.2023. https://openlibrary.org/books/OL26976831M/Python_in_easy_steps CR - Mitchell, R., & Frank, E. (2017). Accelerating the XGBoost algorithm using GPU computing. PeerJ, 3, e127. https://doi.org/10.7717/peerj-cs.127 CR - Murphy, K. P. (2006). Naïve Bayes classifiers. University of British Columbia, 18(60), 1–8 CR - Narkhede, S. (2018). Understanding AUC-ROC curve. Towards Data Science, 26(1), 220–227. CR - Neethu, M. S., & Rajasree, R. (2013). Sentiment analysis in Twitter using machine learning techniques. In 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT) (pp. 1-5). https://doi.org/10.1109/icccnt.2013.6726818 CR - Nowozin, S., Rother, C., Bagon, S., Sharp, T., Yao, B., & Kohli, P. (2011). Decision tree fields. In International Conference on Computer Vision (pp. 1668–1675). https://doi.org/10.1109/iccv.2011.6126429 CR - Patel, N., Parekh, B., Thakkar, N., Gupta, R., Tanwar, S., Sharma, G., & Sharma, R. (2022). Fusion in cryptocurrency price prediction: A decade survey on recent advancements, architecture, and potential future directions. IEEE Access, 10, 34511–34538. https://doi.org/10.1109/access.2022.3163023 CR - Pradana, A. T., & Hayaty, M. (2019). The effect of stemming and removal of stopwords on the accuracy of sentiment analysis on Indonesian-language texts. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 4(4), 375–380. https://doi.org/10.22219/kinetik.v4i4.912 CR - Quinlan, J. R. (1992). C4.5: Programs for Machine Learning. https://cds.cern.ch/record/2031749 CR - Raaijmakers, J. G., & Shiffrin, R. M. (1992). Models for recall and recognition. Annual Review of Psychology, 43(1), 205–234. CR - Rahman, S., Hemel, J. N., Anta, S. J. A., & Muhee, H. A. (2018). Sentiment analysis using R: An approach to correlate Bitcoin price fluctuations with change in user sentiments. BRAC University Institutional Repository. http://dspace.bracu.ac.bd/xmlui/handle/10361/10163 CR - Rigatti, S. J. (2017). Random Forest. Journal of Insurance Medicine, 47(1), 31–39. https://doi.org/10.17849/insm-47-01-31-39.1 CR - Sallis, J., Gripsrud, G., Olsson, U., & Silkoset, R. (2021). Research methods and data analysis for business decisions. Springer International Publishing. https://doi.org/10.1007/978-3-030-84421-9 CR - Sami, O., Elsheikh, Y., & Almasalha, F. (2021). The role of data pre-processing techniques in improving machine learning accuracy for predicting coronary heart disease. International Journal of Advanced Computer Science and Applications, 12(6). https://doi.org/10.14569/ijacsa.2021.0120695 CR - Sammut, C., & Webb, G. I. (2011). Encyclopedia of Machine Learning. Springer Science & Business Media. CR - Shah, D., & Zhang, K. (2014). Bayesian regression and Bitcoin. In 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 409-414). https://doi.org/10.1109/allerton.2014.7028484 CR - Suthaharan, S. (2016). Machine learning models and algorithms for big data classification. Springer Nature. https://doi.org/10.1007/978-1-4899-7641-3 CR - Tanwar, S., Patel, N. A., Patel, S., Patel, J., Sharma, G., & Davidson, I. E. (2021). Deep Learning-Based Cryptocurrency Price Prediction Scheme with Inter-Dependent relations. IEEE Access, 9, 138633–138646. https://doi.org/10.1109/access.2021.3117848 CR - Vumazonke, N., & Parsons, S. (2023). An analysis of South Africa’s guidance on the income tax consequences of crypto assets. South African Journal of Economic and Management Sciences, 26(1). https://doi.org/10.4102/sajems.v26i1.4832 CR - Wang, H., Yao, Y., & Salhi, S. (2020). Tension in big data using machine learning: Analysis and applications. Technological Forecasting and Social Change, 158, 120175. https://doi.org/10.1016/j.techfore.2020.120175 CR - Yogish, D., Manjunath, T. N., & Hegadi, R. S. (2019). Review on Natural Language Processing Trends and Techniques using NLTK. In Communications in Computer and Information Science (pp. 589–606). https://doi.org/10.1007/978-981-13-9187-3_53 CR - Zhang, Y., Jin, R., & Zhou, Z. (2010). Understanding bag-of-words model: A statistical framework. International Journal of Machine Learning and Cybernetics, 1(1–4), 43–52. https://doi.org/10.1007/s13042-010-0001-0 UR - https://doi.org/10.53433/yyufbed.1532649 L1 - https://dergipark.org.tr/en/download/article-file/4142442 ER -