Konu Modelleme Yöntemlerinin Karşılaştırılması
Yıl 2022,
Cilt: 3 Sayı: 2, 46 - 53, 31.05.2022
Ahmet Kaya
,
Eyyüp Gülbandılar
Öz
Son zamanlarda internet üzerinde üretilen veriler her geçen gün artmaktadır. Bu verilerin önemli bir çoğunluğunu da metinler oluşturmaktadır. Metinlerin çoğunlukta olması, bilim insanlarını bu alanda daha fazla çalışma yapmaya yönlendirmiştir. Metinler üzerinde yapılan çalışmaların en popüler olanı Konu Modelleme (KM) yöntemleridir. Konu modelleme yöntemleri metinlerin içerisinde gizli veya açık geçen konuları tespit etmektir. Bu çalışma kapsamında elde edilen metin veri kümeleri üzerinde Gizli Dirichlet Ayrımı (GDA), ilişkisel konu modeli (İKM) ve yapısal konu modeli (YKM) yöntemleri uygulanmıştır. Ayrıca çalışma da konu modelleme yöntemlerinin sonuçlarını karşılaştırabilmek için konu tutarlılığı ve şaşkınlık değerleri kullanılmıştır. Çalışma da kaynak olarak kullanılan yayında uygulanan yöntemlerin sonuçları ile kendi çalışmamızda benzer sonuçlar elde edilmiştir. Şaşkınlık değerine ek olarak kullandığımız tutarlılık değeri de aynı şekilde YKM yönteminde daha başarılı sonuçlar elde edildiği gösterilmiştir. Tutarlılık değeri 0.509 olarak YKM tip 3 yöntemi en iyi sonucu vermiştir. Ayrıca bundan sonra yapılacak çalışmalar içinde karşılaştırma yöntemi gösterilmiştir.
Kaynakça
- [1] Metin madenciligi. http://www.metinmadenciligi.com (Erişim Tarihi: 19.03.2022).
- [2] Alghamdi, R., Alfalqi, K. 2015. A Survey of Topic Modeling in Text Mining. International Journal of Advanced Computer Science and Applications, 6.
- [3] Xiao, Z. 2014. CorrRank: Correlation based ranking topic model. Journal of Computational Information Systems, 10.
- [4] Hilmi, M. F., Mustapha, Y.,Omar, M. T. C. 2020. Innovation in an Emerging Market: A Bibliometric and Latent Dirichlet Allocation Based Topic Modeling Study.
- [5] Güven,Z., Diri, B., Çakaloğlu,T. 2019. Comparison of Topic Modeling Methods for Type Detection of Turkish News. 2019 4th International Conference on Computer Science and Engineering (UBMK), 150-154.
- [6] Ekinci, E., Omurca, S. 2017. Ürün Özelliklerinin Konu Modelleme Yöntemi ile Çıkartılması. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 51-58.
- [7] Negara, Edi Surya., Triadi, D. 2019. Topic Modelling Twitter Data with Latent Dirichlet Allocation Method. International Conference on Electrical Engineering and Computer Science (ICECOS), 386-390.
- [8] Zhang, F., Gao, W., Fang, Y. , Zhang, B. 2020. Enhancing Short Text Topic Modeling with FastText Embeddings. Artificial Intelligence and Internet of Things Engineering (ICBAIE), 255-259.
- [9] Blei, D., Lafferty, J. 2005. Correlated topic models. Advances in neural information processing systems, 18, 147.
- [10] Blei, D., Lafferty, J. . 2007. A correlated topic model of Science. The Annals of Applied Statistics.
- [11] Liu, L. &. 2019. Neural Variational Correlated Topic Modeling. WWW '19: The World Wide Web Conference, 1142-1152. 28
- [12] Fu, X., Huang, K., Sidiropoulos, N. D., Shi, Q., Hong, M. 2018. Anchor-Free Correlated Topic Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1-1.
- [13] He, J., Hu, Z., Berg-Kirkpatrick, T., Huang, Y., Xing, E. P. . 2017. Efficient Correlated Topic Modeling with Topic Embedding. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 225-233.
- [14] Xu, X., Shimada, A., Taniguchi, R. 2013. Correlated topic model for image annotation. In The 19th KoreaJapan Joint Workshop on Frontiers of Computer Vision , 201-208.
- [15] Funnell, T., Zhang, A. W., Grewal, D., McKinney, S., Bashashati, A., Wang, Y. K., Shah, S. P. 2019. Integrated structural variation and point mutation signatures in cancer genomes using correlated topic models. PLOS Computational Biology, 15.
- [16] Esmizadeh, Y., Canziani, B., Nemati, H.R., Mondaresnezhad, M. 2020. Sharing Economy: Application of Structural Topic Models.
- [17] Roberts, M. E., Stewart, B. M., Tingley, D. 2019. stm : An R Package for Structural Topic Models. Journal of Statistical Software, 91.
- [18] Liu, S., Yao, Y., Hu, Q. 2021. Characterization of Idea Relations in Text: Investigation with Topic Modelling and Structural Topic Modelling. 10.
- [19] Bai, X., Zhang, X., Li, K. X., Zhou, Y., Yuen, K. F. 2021. Research Topics and Trends in the Maritime Transport: a Structural Topic Model. Transport Policy.
- [20] Sim, S. H., Choi, H. G. 2016. The Structured Topic Model for E-Learning System. Advanced Science Letters.
- [21] Ma, Y. 2021. A Structural Topic Model Analysis of Privacy in Mandarin Chinese News: 2010–2019. Proceedings of the Association for Information Science and Technology, 58(1):792-794.
- [22] Hu, N., Zhang, T., Gao, B., Bose, I. 2019. What do hotel customers complain about? Text analysis using structural topic model. Tourism Management, 72, 417-426.
- [23] Blei, D. M., Ng, A. Y. 2003. Latent dirichlet allocation. the Journal of machine Learning research , 3, 993-1022.
- [24] Ekinci, E., Omurca, S. İ., KIRIK, E., TAŞÇI, Ş. 2020. Tıp Veri Kümesi için Gizli Dirichlet Ayrımı. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 22(64), 67-80.
- [25] towards data science https://towardsdatascience.com/intuitive-guide-tocorrelated-topic-models-76d5baef03d3 (Erişim Tarihi: 19.03.2022).
- [26] Oo, M. K., Khine, M. A. 2020. Topic extraction of crawled documents collection using correlated topic model in mapreduce framework. arXiv preprint arXiv:2001.01669.
- [27] Roberts, M. E., Stewart, B. M., Tingley, D. (2019). stm: An R Package for Structural Topic Models. Journal of Statistical Software, 91(2), 1–40.
- [28] structural topic model https://www.structuraltopicmodel.com (Erişim Tarihi: 19.03.2022).
- [29] Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J., Blei, D. 2009. Reading tea leaves: How humans interpret topic models. Advances in neural information processing systems, 22.
- [30] Michael R., Andreas, B., Alexander H. 2015. Exploring the Space of Topic Coherence Measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM '15), 399–408.
- [31] towards data science https://towardsdatascience.com/latent-dirichletallocation-lda-9d1cd064ffa2 (Erişim Tarihi: 19.03.2022).
- [32] Türkmen, H., Omurca, S. I., Ekinci, E. 2016. An aspect based sentiment analysis on Turkish hotel reviews. Girne American University Journal of Social and Applied Sciences, 6(2), 12-15.
- [33] Wang, H., Zhang, D., Zhai, C. 2011. Structural Topic Model for Latent Topical Structure Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 1526-1535.
- [34] Ekinci, E., Türkmen, H., İlhan Omurca, S. 2016. Multiword Aspect Extraction from User Reviews. 6th World Conference on Innovatıon and Computer Science (INSODE2016).
- [35] high demand skills. https://highdemandskills.com/topic-modelevaluation (Erişim Tarihi: 19.03.2022).
- [36] high demand skills. https://highdemandskills.com/topic-modeling-intuitive (Erişim Tarihi: 19.03.2022).
- [37] Li, S. https://github.com/susanli2016/NLP-withPython/blob/master/Web%20scraping%20Hilton%20Hawaiian%20Village%20Trip Advisor%20Reviews.py (Erişim Tarihi: 19.03.2022).
- [38] Roberts, M. E., Stewart, B. M., Tingley, D., Airoldi, E. M. 2013. The structural topic model and applied social science. In Advances in neural information processing systems workshop on topic models: computation, application, and evaluation, Vol. 4, pp. 1-20.
Comparison of Topic Modeling Methods
Yıl 2022,
Cilt: 3 Sayı: 2, 46 - 53, 31.05.2022
Ahmet Kaya
,
Eyyüp Gülbandılar
Öz
Recently, the data produced on the internet is increasing day by day. A significant majority of this data consists of texts. The fact that the texts are in the majority has led scientists to do more studies in this field. The most popular of the studies on texts is Topic Modeling (KM) methods. Topic modeling methods are to identify hidden or open topics in texts. Hidden dirichlet separation (GDA), relational subject model (HRM) and structural subject model (YKM) methods were applied on the text datasets obtained within the scope of this study. In addition, subject consistency and astonishment values were used in the study to compare the results of subject modeling methods. Similar results were obtained in our study with the results of the methods applied in the publication used as a source in the study. In addition to the surprise value, the consistency value we used has also shown that more successful results are obtained in the YKM method. With a consistency value of 0.509, YKM type 3 method gave the best results. In addition, the method of comparison is shown in future studies.
Kaynakça
- [1] Metin madenciligi. http://www.metinmadenciligi.com (Erişim Tarihi: 19.03.2022).
- [2] Alghamdi, R., Alfalqi, K. 2015. A Survey of Topic Modeling in Text Mining. International Journal of Advanced Computer Science and Applications, 6.
- [3] Xiao, Z. 2014. CorrRank: Correlation based ranking topic model. Journal of Computational Information Systems, 10.
- [4] Hilmi, M. F., Mustapha, Y.,Omar, M. T. C. 2020. Innovation in an Emerging Market: A Bibliometric and Latent Dirichlet Allocation Based Topic Modeling Study.
- [5] Güven,Z., Diri, B., Çakaloğlu,T. 2019. Comparison of Topic Modeling Methods for Type Detection of Turkish News. 2019 4th International Conference on Computer Science and Engineering (UBMK), 150-154.
- [6] Ekinci, E., Omurca, S. 2017. Ürün Özelliklerinin Konu Modelleme Yöntemi ile Çıkartılması. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 51-58.
- [7] Negara, Edi Surya., Triadi, D. 2019. Topic Modelling Twitter Data with Latent Dirichlet Allocation Method. International Conference on Electrical Engineering and Computer Science (ICECOS), 386-390.
- [8] Zhang, F., Gao, W., Fang, Y. , Zhang, B. 2020. Enhancing Short Text Topic Modeling with FastText Embeddings. Artificial Intelligence and Internet of Things Engineering (ICBAIE), 255-259.
- [9] Blei, D., Lafferty, J. 2005. Correlated topic models. Advances in neural information processing systems, 18, 147.
- [10] Blei, D., Lafferty, J. . 2007. A correlated topic model of Science. The Annals of Applied Statistics.
- [11] Liu, L. &. 2019. Neural Variational Correlated Topic Modeling. WWW '19: The World Wide Web Conference, 1142-1152. 28
- [12] Fu, X., Huang, K., Sidiropoulos, N. D., Shi, Q., Hong, M. 2018. Anchor-Free Correlated Topic Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1-1.
- [13] He, J., Hu, Z., Berg-Kirkpatrick, T., Huang, Y., Xing, E. P. . 2017. Efficient Correlated Topic Modeling with Topic Embedding. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 225-233.
- [14] Xu, X., Shimada, A., Taniguchi, R. 2013. Correlated topic model for image annotation. In The 19th KoreaJapan Joint Workshop on Frontiers of Computer Vision , 201-208.
- [15] Funnell, T., Zhang, A. W., Grewal, D., McKinney, S., Bashashati, A., Wang, Y. K., Shah, S. P. 2019. Integrated structural variation and point mutation signatures in cancer genomes using correlated topic models. PLOS Computational Biology, 15.
- [16] Esmizadeh, Y., Canziani, B., Nemati, H.R., Mondaresnezhad, M. 2020. Sharing Economy: Application of Structural Topic Models.
- [17] Roberts, M. E., Stewart, B. M., Tingley, D. 2019. stm : An R Package for Structural Topic Models. Journal of Statistical Software, 91.
- [18] Liu, S., Yao, Y., Hu, Q. 2021. Characterization of Idea Relations in Text: Investigation with Topic Modelling and Structural Topic Modelling. 10.
- [19] Bai, X., Zhang, X., Li, K. X., Zhou, Y., Yuen, K. F. 2021. Research Topics and Trends in the Maritime Transport: a Structural Topic Model. Transport Policy.
- [20] Sim, S. H., Choi, H. G. 2016. The Structured Topic Model for E-Learning System. Advanced Science Letters.
- [21] Ma, Y. 2021. A Structural Topic Model Analysis of Privacy in Mandarin Chinese News: 2010–2019. Proceedings of the Association for Information Science and Technology, 58(1):792-794.
- [22] Hu, N., Zhang, T., Gao, B., Bose, I. 2019. What do hotel customers complain about? Text analysis using structural topic model. Tourism Management, 72, 417-426.
- [23] Blei, D. M., Ng, A. Y. 2003. Latent dirichlet allocation. the Journal of machine Learning research , 3, 993-1022.
- [24] Ekinci, E., Omurca, S. İ., KIRIK, E., TAŞÇI, Ş. 2020. Tıp Veri Kümesi için Gizli Dirichlet Ayrımı. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 22(64), 67-80.
- [25] towards data science https://towardsdatascience.com/intuitive-guide-tocorrelated-topic-models-76d5baef03d3 (Erişim Tarihi: 19.03.2022).
- [26] Oo, M. K., Khine, M. A. 2020. Topic extraction of crawled documents collection using correlated topic model in mapreduce framework. arXiv preprint arXiv:2001.01669.
- [27] Roberts, M. E., Stewart, B. M., Tingley, D. (2019). stm: An R Package for Structural Topic Models. Journal of Statistical Software, 91(2), 1–40.
- [28] structural topic model https://www.structuraltopicmodel.com (Erişim Tarihi: 19.03.2022).
- [29] Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J., Blei, D. 2009. Reading tea leaves: How humans interpret topic models. Advances in neural information processing systems, 22.
- [30] Michael R., Andreas, B., Alexander H. 2015. Exploring the Space of Topic Coherence Measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM '15), 399–408.
- [31] towards data science https://towardsdatascience.com/latent-dirichletallocation-lda-9d1cd064ffa2 (Erişim Tarihi: 19.03.2022).
- [32] Türkmen, H., Omurca, S. I., Ekinci, E. 2016. An aspect based sentiment analysis on Turkish hotel reviews. Girne American University Journal of Social and Applied Sciences, 6(2), 12-15.
- [33] Wang, H., Zhang, D., Zhai, C. 2011. Structural Topic Model for Latent Topical Structure Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 1526-1535.
- [34] Ekinci, E., Türkmen, H., İlhan Omurca, S. 2016. Multiword Aspect Extraction from User Reviews. 6th World Conference on Innovatıon and Computer Science (INSODE2016).
- [35] high demand skills. https://highdemandskills.com/topic-modelevaluation (Erişim Tarihi: 19.03.2022).
- [36] high demand skills. https://highdemandskills.com/topic-modeling-intuitive (Erişim Tarihi: 19.03.2022).
- [37] Li, S. https://github.com/susanli2016/NLP-withPython/blob/master/Web%20scraping%20Hilton%20Hawaiian%20Village%20Trip Advisor%20Reviews.py (Erişim Tarihi: 19.03.2022).
- [38] Roberts, M. E., Stewart, B. M., Tingley, D., Airoldi, E. M. 2013. The structural topic model and applied social science. In Advances in neural information processing systems workshop on topic models: computation, application, and evaluation, Vol. 4, pp. 1-20.