CLASSIFICATION OF USER COMMENTS IN A MOBILE APPLICATION USING DATA AUGMENTATION WITH MACHINE LEARNING TECHNIQUES

Özer Çelik; Gürkan Kaplan

doi:10.21923/jesd.906211

Research Article

CLASSIFICATION OF USER COMMENTS IN A MOBILE APPLICATION USING DATA AUGMENTATION WITH MACHINE LEARNING TECHNIQUES

Year 2021, Volume: 9 Issue: 4, 1398 - 1407, 20.12.2021

Özer Çelik , Gürkan Kaplan

https://doi.org/10.21923/jesd.906211

Abstract

With the increasing use of social media in recent years, there are too many comments to be followed on almost every issue. These comments contain both important and unimportant information. But, it is almost impossible to follow of so many comments nowadays. In this study, text classification of user comments made to the Anadolu University mobile application was made. It was estimated whether the comments made on the application were related to the content or the application. In addition, the effect of oversampling and undersampling on text classification performance was investigated. For this purpose, synthetic minority oversampling technique (Smote), condensed nearest neighbor undersampling technique (CNN) and random undersampling (RUS) technique were applied to the data set. 1008 user comments received from mobile application were classified by these techniques. In the Smote oversampling classification, ANN algorithm was found to have the best classification with 93.57% accuracy. In the undersampling classification, Random Forest algorithm was found to have the best classifications with 72.22% accuracy. In the random sampling classification, Extreme Gradient Boosting algorithm was found to have the best classification with 84.44% accuracy.

Keywords

Text classification , Machine learning , Artificial Intelligence , Natural language processing

References

Amasyalı, M.F., Yıldırım, T., 2004. Otomatik haber metinleri sınıflandırma, 224-226 pp, (in Turkish).
Amasyalı, M.F., Diri, B., 2006. Automatic Turkish text categorization in terms of author, genre and gender, In International Conference on Application of Natural Language to Information Systems, 221-226 pp.
Bengisu, E. R. D. İ., Şahin, E. A., Toydemir, M. S., Dokeroglu, T. Makine Öğrenmesi Algoritmaları ile Trol Hesapların Tespiti. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 9(1), 430-442 pp, (in Turkish).
Celik, O., Osmanoglu, U.O., 2019. Comparing to Techniques Used in Customer Churn Analysis, Journal of Multidisciplinary Developments, 4(1):30-38 pp.
Chaffar S., Inkpen D., 2011. Using a heterogeneous dataset for emotion analysis in text, Butz C., Lingras P. (eds) Advances in Artificial Intelligence. Lecture Notes in Computer Science, vol 6657. Springer, Berlin, Heidelberg, 62-71 pp.
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W. P., 2002. SMOTE: synthetic minority over-sampling technique, Journal of artificial intelligence research, 16, 321-357 pp.
Estabrooks A., 2000. A combination scheme for inductive learning from imbalanced data sets, Diss. DalTech.
Estabrooks A., Jo T., Japkowicz N., 2004. A multiple resampling method for learning from imbalanced datasets, Computational intelligence, 20(1): 18-36 pp.
Fernández, A., Garcia, S., Herrera, F., Chawla, N.V. , 2018. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary, Journal of artificial intelligence research, 61, 863-905 pp.
Friedl M.A., Brodley C.E., 1997. Decision tree classification of land cover from remotely sensed data, Remote sensing of environment, 61(3):399-409 pp.
Güran, A., Akyokus¸, S., Bayazıt, N.G., Gürbüz, M.Z., 2009. Turkish text categorization using n-gram words, In Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications, 369-373 pp.
Güven, A., Bozkurt, Ö.Ö., Kalıpsız, O., 2006. Advanced Information Extraction with n-gram based LSI, In Proceedings of World Academy of Science, Engineering and Technology, 17:13-18 pp.
Hu J., Min J., 2018. Automated detection of driver fatigue based on EEG signals using gradient boosting decision tree model, Cognitive Neurodynamics, 431-440 pp.
Jordan, S. E., Hovet, S. E., Fung, I. C. H., Liang, H., Fu, K. W., & Tse, Z. T. H. (2019). Using Twitter for public health surveillance from monitoring and prediction to public response. Data, 4(1), 6.
Monisha A., Christina S.S., Santiago N., 2018. Decision Support System for a Chronic Disease- Diabetes, International Journal of Computer & Mathematical Sciences(IJCMS), 7(3):126-131 pp.
Müller K.R., Smola A., Ratsch G., Scholkopf B., Kohlmorgen J., Vapnik V. , 1997. Predicting time series with support vector machines, International Conference on Artificial Neural Networks Springer, Berlin, Heidelberg, 999-1004 pp.
Özgür, L., Güngör, T., Gürgen, F., 2004. Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish.”, Pattern Recognition Letters, 25(16):1819-1831 pp.
Petkovic D., Altman R., Wong M., Vigil A., 2018. Improving the explainability of Random Forest classifier?user centered approach, Pacific Symposium on Biocomputing, 23:204-215 pp.
Schlögl A., Lee F., Bischof H., Pfurtscheller G., 2005. Characterization of four-class motor imagery EEG data for the BCI- competition, Journal of neural engineering, 2(4): L14.
Schwarm S.E., Ostendorf M., 2015. Reading level assessment using support vector machines and statistical language models, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 523-530 pp.
Shi, T., Kang, K., Choo, J., & Reddy, C. K. (2018, April). Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In Proceedings of the 2018 World Wide Web Conference (pp. 1105-1114).
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., & Demirbas, M. (2010, July). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 841-842).
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y., 2007. Cost-sensitive boosting for classification of imbalanced data, Pattern Recognition, 40(12): 3358-3378 pp.
Tantuğ, A. C. , 2016. Metin Sınıflandırma, Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 5(2) (in Turkish).
Tüfekci, P., Uzun, E., Sevinç, B, 2012. Text classification of web based news articles by using Turkish grammatical features, 20th Signal Processing and Communications Applications Conference, 1-4 pp.
Vapnik V. , 1995 The nature of statistical learning theory, Springer, 2nd edition, New York, USA, 32-40 pp.
Yang L., Zhang X., Liang S., Yao Y., Jia K., Jia A., 2018. Estimating Surface Downward Shortwave Radiation over China Based on the Gradient Boosting Decision Tree Method, Remote Sensing, 10(2): 185.
Yildiz, H.K., Gençtav, M., Usta, N., Diri, B., Amasyali, M. F. , 2007. A new feature extraction method for text classification, IEEE 15th Signal Processing and Communications Applications, 1-4 pp.

MAKİNE ÖĞRENMESİ TEKNİKLERİ İLE VERİ ÇOĞALTMA KULLANARAK BİR MOBİL UYGULAMADA KULLANICI YORUMLARININ SINIFLANDIRILMASI

Year 2021, Volume: 9 Issue: 4, 1398 - 1407, 20.12.2021

Özer Çelik , Gürkan Kaplan

https://doi.org/10.21923/jesd.906211

Abstract

Son yıllarda sosyal medya kullanımının artması ile beraber neredeyse her konuda takip edilemeyecek kadar çok yorum bulunmaktadır. Bu yorumlar hem olumlu hem de olumsuz yorumlar içermektedir. Fakat günümüzde çok sayıda yorumu takip etmek neredeyse imkansızdır. Bu çalışmada açık erişimli Anadolu Üniversitesi’nin mobil uygulamasına yapılan kullanıcı yorumlarının çeşitli makine öğrenmesi teknikleri ile metin sınıflandırması yapıldı. Uygulamaya yapılan yorumların içerikle mi yoksa uygulama ile mi ilgili olduğu tahmin edilmeye çalışıldı. Buna ek olarak aşırı örnekleme ve az örneklemenin metin sınıflandırma performansına etkisi incelendi. Bu amaçla sentetik azınlık aşırı örnekleme tekniği (Smote), yoğun en yakın komşu az örnekleme tekniği (CNN) ve rasgele az örnekleme tekniği (RUS) veri setine uygulandı. Mobil uygulamadan alınan 1008 kullanıcı yorumu içerik ve uygulama açısından süreçlerden geçirilerek sınıflandırıldı. Smote aşırı örnekleme sınıflandırmasında ANN algoritması %93.57 doğrulukla en iyi sınıflandırma olarak bulundu. CNN algoritmasında Rassal Orman algoritması %72.22 doğrulukla en iyi sınıflandırmalar olarak bulundu. RUS tekniğinde ise Aşırı Gradient artırma %84.44 doğrulukla en iyi sınıflandırma olarak bulundu.

Keywords

Metin sınıflandırma , Makine öğrenmesi , Yapay Zeka , Doğal dil işleme

References

Amasyalı, M.F., Yıldırım, T., 2004. Otomatik haber metinleri sınıflandırma, 224-226 pp, (in Turkish).
Amasyalı, M.F., Diri, B., 2006. Automatic Turkish text categorization in terms of author, genre and gender, In International Conference on Application of Natural Language to Information Systems, 221-226 pp.
Bengisu, E. R. D. İ., Şahin, E. A., Toydemir, M. S., Dokeroglu, T. Makine Öğrenmesi Algoritmaları ile Trol Hesapların Tespiti. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 9(1), 430-442 pp, (in Turkish).
Celik, O., Osmanoglu, U.O., 2019. Comparing to Techniques Used in Customer Churn Analysis, Journal of Multidisciplinary Developments, 4(1):30-38 pp.
Chaffar S., Inkpen D., 2011. Using a heterogeneous dataset for emotion analysis in text, Butz C., Lingras P. (eds) Advances in Artificial Intelligence. Lecture Notes in Computer Science, vol 6657. Springer, Berlin, Heidelberg, 62-71 pp.
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W. P., 2002. SMOTE: synthetic minority over-sampling technique, Journal of artificial intelligence research, 16, 321-357 pp.
Estabrooks A., 2000. A combination scheme for inductive learning from imbalanced data sets, Diss. DalTech.
Estabrooks A., Jo T., Japkowicz N., 2004. A multiple resampling method for learning from imbalanced datasets, Computational intelligence, 20(1): 18-36 pp.
Fernández, A., Garcia, S., Herrera, F., Chawla, N.V. , 2018. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary, Journal of artificial intelligence research, 61, 863-905 pp.
Friedl M.A., Brodley C.E., 1997. Decision tree classification of land cover from remotely sensed data, Remote sensing of environment, 61(3):399-409 pp.
Güran, A., Akyokus¸, S., Bayazıt, N.G., Gürbüz, M.Z., 2009. Turkish text categorization using n-gram words, In Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications, 369-373 pp.
Güven, A., Bozkurt, Ö.Ö., Kalıpsız, O., 2006. Advanced Information Extraction with n-gram based LSI, In Proceedings of World Academy of Science, Engineering and Technology, 17:13-18 pp.
Hu J., Min J., 2018. Automated detection of driver fatigue based on EEG signals using gradient boosting decision tree model, Cognitive Neurodynamics, 431-440 pp.
Jordan, S. E., Hovet, S. E., Fung, I. C. H., Liang, H., Fu, K. W., & Tse, Z. T. H. (2019). Using Twitter for public health surveillance from monitoring and prediction to public response. Data, 4(1), 6.
Monisha A., Christina S.S., Santiago N., 2018. Decision Support System for a Chronic Disease- Diabetes, International Journal of Computer & Mathematical Sciences(IJCMS), 7(3):126-131 pp.
Müller K.R., Smola A., Ratsch G., Scholkopf B., Kohlmorgen J., Vapnik V. , 1997. Predicting time series with support vector machines, International Conference on Artificial Neural Networks Springer, Berlin, Heidelberg, 999-1004 pp.
Özgür, L., Güngör, T., Gürgen, F., 2004. Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish.”, Pattern Recognition Letters, 25(16):1819-1831 pp.
Petkovic D., Altman R., Wong M., Vigil A., 2018. Improving the explainability of Random Forest classifier?user centered approach, Pacific Symposium on Biocomputing, 23:204-215 pp.
Schlögl A., Lee F., Bischof H., Pfurtscheller G., 2005. Characterization of four-class motor imagery EEG data for the BCI- competition, Journal of neural engineering, 2(4): L14.
Schwarm S.E., Ostendorf M., 2015. Reading level assessment using support vector machines and statistical language models, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 523-530 pp.
Shi, T., Kang, K., Choo, J., & Reddy, C. K. (2018, April). Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In Proceedings of the 2018 World Wide Web Conference (pp. 1105-1114).
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., & Demirbas, M. (2010, July). Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 841-842).
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y., 2007. Cost-sensitive boosting for classification of imbalanced data, Pattern Recognition, 40(12): 3358-3378 pp.
Tantuğ, A. C. , 2016. Metin Sınıflandırma, Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 5(2) (in Turkish).
Tüfekci, P., Uzun, E., Sevinç, B, 2012. Text classification of web based news articles by using Turkish grammatical features, 20th Signal Processing and Communications Applications Conference, 1-4 pp.
Vapnik V. , 1995 The nature of statistical learning theory, Springer, 2nd edition, New York, USA, 32-40 pp.
Yang L., Zhang X., Liang S., Yao Y., Jia K., Jia A., 2018. Estimating Surface Downward Shortwave Radiation over China Based on the Gradient Boosting Decision Tree Method, Remote Sensing, 10(2): 185.
Yildiz, H.K., Gençtav, M., Usta, N., Diri, B., Amasyali, M. F. , 2007. A new feature extraction method for text classification, IEEE 15th Signal Processing and Communications Applications, 1-4 pp.

There are 28 citations in total.

Details

Primary Language	English
Subjects	Computer Software
Journal Section	Research Articles
Authors	Özer Çelik 0000-0002-4409-3101 Gürkan Kaplan 0000-0002-6393-5546
Publication Date	December 20, 2021
Submission Date	March 30, 2021
Acceptance Date	August 10, 2021
Published in Issue	Year 2021 Volume: 9 Issue: 4

Cite

APA	Çelik, Ö., & Kaplan, G. (2021). CLASSIFICATION OF USER COMMENTS IN A MOBILE APPLICATION USING DATA AUGMENTATION WITH MACHINE LEARNING TECHNIQUES. Mühendislik Bilimleri Ve Tasarım Dergisi, 9(4), 1398-1407. https://doi.org/10.21923/jesd.906211

Download Cover Image

Article Files

Full Text