Kelime Vektörü Yöntemlerinin Model Oluşturma Sürelerinin Karşılaştırılması

Metin Bilgin

doi:10.17671/gazibtd.472226

Research Article

Kelime Vektörü Yöntemlerinin Model Oluşturma Sürelerinin Karşılaştırılması

Year 2019, Volume: 12 Issue: 2, 141 - 146, 30.04.2019

Metin Bilgin

https://doi.org/10.17671/gazibtd.472226

Cited By: 4

Abstract

Bu çalışmada duygu analizi için oluşturulan iki farklı veri kümesi, kelime vektörü algoritması olan Word2Vec ile modellenmiştir. Model oluşturulurken Word2Vec’in iki farklı yöntemi olan CBoW (Continous Bag of Words) ve Skip-Gram kullanılmıştır. Word2Vec ile bir metnin modelini oluşturmak için genellikle Ortalama yöntemi kullanılmaktadır. Bu çalışmada hem CBoW hem de Skip-Gram yöntemleriyle bir metni modellemek için üç farklı yöntem önerilmiştir. Model oluşturma (eğitim zamanı) süreleri her ikisi içinde ölçülmüştür. Sonuç olarak modelleme süresi açısından CBoW’un Skip-Gram’dan daha başarılı olduğu deneysel olarak gösterilmiştir.

Keywords

Word2Vec , Model Süresi , Eğitim süresi , CBOW , Skip-Gram , doğal dil işleme

References

[1] D. Zhang, H. Xu, Z. Su, Y. Xu, “Chinese comments sentiment classification based on word2vec and SVMperf”, Expert Systems with Applications, 42(4), 1857-1863, 2015.
[2] B. Dickinson, W. Hu, “Sentiment analysis of investor opinions on Twitter”, Social Networking, 4(03), 62-71, 2015.
[3] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, B. Qin, “Learning sentiment-specific word embedding for twitter sentiment classification”, 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA,1555-1565, 23-25 Haziran, 2014.
[4] J. Polpinij, N. Srikanjanapert, P. Sopon, “Word2Vec Approach for Sentiment Classification Relating to Hotel Reviews”, International Conference on Computing and Information Technology, Bangkok, Thailand, 308-316, 6-7 Temmuz, 2017.
[5] G. Şahin, “Turkish document classification based on Word2Vec and SVM classifier”, Signal Processing and Communications Applications Conference, Antalya, Turkey,1-4,15-18 Mayıs, 2017.
[6] B. Xue, C. Fu, Z. Shaobin, “A study on sentiment computing and classification of sina weibo with word2vec”, International Congress on Big Data, Anchorage, AK, USA, 358-363, 27 Haziran-2 Temmuz, 2014.
[7] M. Bilgin., İ.F. Şentürk, “Sentiment analysis on Twitter data with semi-supervised Doc2Vec”, International Conference on Computer Science and Engineering, Antalya, Turkey, 661-666, 5-8 Ekim, 2017.
[8] M. Bilgin, H. Köktaş, “Word2Vec Based Sentiment Analysis for Turkish Texts”, International Conference on Engineering Technologies, Konya, Turkey, 106-109, 26-28 Ekim, 2017.
[9] A. El Mahdaouy, E. Gaussier, S. O. El Alaoui, “Arabic Text Classification Based on Word and Document Embeddings”, International Conference on Advanced Intelligent Systems and Informatics, Cairo, Eygpt, 32-41, 24-26 Ekim, 2016.
[10] Ö. Çoban, I. Karabey, “Music genre classification with word and document vectors”, Signal Processing and Communications Applications Conference, Antalya, Turkey, 1-4, 15-18 Mayıs, 2017.
[11] H. Ma, X. Wiang, J. Hou, Y. Lu, “Course recommendation based on semantic similarity analysis”, International Conference on Control Science and Systems Engineering, Beijing, China, 638-641, 17-19 Ağustos, 2017.
[12] M. Razzaghnoori, S. Hedieh, K.J. Iman, “Question classification in Persian using word vectors and frequencies”, Cognitive Systems Research, 47, 16-27, 2018.
[13] H. Polat, M. Körpe, “TBMM Genel Kurul Tutanaklarından Yakın Anlamlı Kavramların Çıkarılması”, Bilişim Teknolojileri Dergisi, 11(3), 235-244, 2018.
[14] T. Mikolov, K. Chen, G. Corrado, J. Dean, “Efficient Estimation of Word Representations in Vector Space”, International Conference on Learning Representations, Arizona, USA, 1-12, 2-4 Mayıs, 2013.
[15] Internet: X Data Set, https:// www.dropbox.com/s/aji68llxmtcuu5l/data.zip, 05.09.2018.
[16] Internet: Y Data Set, https://www.kaggle.com/c/si650winter11/data, 05.09.2018
[17] M.R. Berthold, N. Cebron, F. Dill, T.R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel, B. Wiswedel, “KNIME-the Konstanz information miner: version 2.0 and beyond”, AcM SIGKDD explorations Newsletter, 11(1), 26-31, 2009.
[18] Internet: Mean, https://www.wikipedia.com, 08.09.2018.
[19] Internet: Mean-Median, https://byjus.com/mean-median-mode-formula, 08.09.2018.

Comparison of Modeling Time of Word Vector Methods

Year 2019, Volume: 12 Issue: 2, 141 - 146, 30.04.2019

Metin Bilgin

https://doi.org/10.17671/gazibtd.472226

Cited By: 4

Abstract

In this study, two different datasets for sentiment analysis have been modeled by Word2Vec that it is a word vector algorithm. While the model is creating that has used two different methods CBoW and Skip-Gram of Word2Vec. Generally, the arithmetic mean is used for modeling a text with Word2Vec. In this study, three different methods for modeling a text are suggested on both CBoW and Skip-Gram. Its modeling time (training time) is measured. As a result, it was experimentally shown that CBoW is more successful than Skip-Gram based for modeling time.

Keywords

Word2Vec , modeling time , training time , CBoW , Skip-Gram , natural language processing

References

[1] D. Zhang, H. Xu, Z. Su, Y. Xu, “Chinese comments sentiment classification based on word2vec and SVMperf”, Expert Systems with Applications, 42(4), 1857-1863, 2015.
[2] B. Dickinson, W. Hu, “Sentiment analysis of investor opinions on Twitter”, Social Networking, 4(03), 62-71, 2015.
[3] D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, B. Qin, “Learning sentiment-specific word embedding for twitter sentiment classification”, 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA,1555-1565, 23-25 Haziran, 2014.
[4] J. Polpinij, N. Srikanjanapert, P. Sopon, “Word2Vec Approach for Sentiment Classification Relating to Hotel Reviews”, International Conference on Computing and Information Technology, Bangkok, Thailand, 308-316, 6-7 Temmuz, 2017.
[5] G. Şahin, “Turkish document classification based on Word2Vec and SVM classifier”, Signal Processing and Communications Applications Conference, Antalya, Turkey,1-4,15-18 Mayıs, 2017.
[6] B. Xue, C. Fu, Z. Shaobin, “A study on sentiment computing and classification of sina weibo with word2vec”, International Congress on Big Data, Anchorage, AK, USA, 358-363, 27 Haziran-2 Temmuz, 2014.
[7] M. Bilgin., İ.F. Şentürk, “Sentiment analysis on Twitter data with semi-supervised Doc2Vec”, International Conference on Computer Science and Engineering, Antalya, Turkey, 661-666, 5-8 Ekim, 2017.
[8] M. Bilgin, H. Köktaş, “Word2Vec Based Sentiment Analysis for Turkish Texts”, International Conference on Engineering Technologies, Konya, Turkey, 106-109, 26-28 Ekim, 2017.
[9] A. El Mahdaouy, E. Gaussier, S. O. El Alaoui, “Arabic Text Classification Based on Word and Document Embeddings”, International Conference on Advanced Intelligent Systems and Informatics, Cairo, Eygpt, 32-41, 24-26 Ekim, 2016.
[10] Ö. Çoban, I. Karabey, “Music genre classification with word and document vectors”, Signal Processing and Communications Applications Conference, Antalya, Turkey, 1-4, 15-18 Mayıs, 2017.
[11] H. Ma, X. Wiang, J. Hou, Y. Lu, “Course recommendation based on semantic similarity analysis”, International Conference on Control Science and Systems Engineering, Beijing, China, 638-641, 17-19 Ağustos, 2017.
[12] M. Razzaghnoori, S. Hedieh, K.J. Iman, “Question classification in Persian using word vectors and frequencies”, Cognitive Systems Research, 47, 16-27, 2018.
[13] H. Polat, M. Körpe, “TBMM Genel Kurul Tutanaklarından Yakın Anlamlı Kavramların Çıkarılması”, Bilişim Teknolojileri Dergisi, 11(3), 235-244, 2018.
[14] T. Mikolov, K. Chen, G. Corrado, J. Dean, “Efficient Estimation of Word Representations in Vector Space”, International Conference on Learning Representations, Arizona, USA, 1-12, 2-4 Mayıs, 2013.
[15] Internet: X Data Set, https:// www.dropbox.com/s/aji68llxmtcuu5l/data.zip, 05.09.2018.
[16] Internet: Y Data Set, https://www.kaggle.com/c/si650winter11/data, 05.09.2018
[17] M.R. Berthold, N. Cebron, F. Dill, T.R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel, B. Wiswedel, “KNIME-the Konstanz information miner: version 2.0 and beyond”, AcM SIGKDD explorations Newsletter, 11(1), 26-31, 2009.
[18] Internet: Mean, https://www.wikipedia.com, 08.09.2018.
[19] Internet: Mean-Median, https://byjus.com/mean-median-mode-formula, 08.09.2018.

There are 19 citations in total.

Details

Primary Language	Turkish
Subjects	Computer Software
Journal Section	Articles
Authors	Metin Bilgin 0000-0002-4216-0542
Publication Date	April 30, 2019
Submission Date	October 19, 2018
Published in Issue	Year 2019 Volume: 12 Issue: 2

Cite

APA	Bilgin, M. (2019). Kelime Vektörü Yöntemlerinin Model Oluşturma Sürelerinin Karşılaştırılması. Bilişim Teknolojileri Dergisi, 12(2), 141-146. https://doi.org/10.17671/gazibtd.472226

Journal of Information Technologies

Kelime Vektörü Yöntemlerinin Model Oluşturma Sürelerinin Karşılaştırılması

Abstract

Keywords

References

Comparison of Modeling Time of Word Vector Methods

Abstract

Keywords

References

Details

Cite

Cited By

Detection of Turkish Fake News in Twitter with Machine Learning Algorithms

Arabian Journal for Science and Engineering

Suleyman Gokhan Taskin

https://doi.org/10.1007/s13369-021-06223-0

Eğitim İçerikleri için Sezgisel Metin Bölütlemeye Dayalı Çoklu Etiketleme Stratejisi: M.E.B. Sanat Tarihi Kitabı için Bir Durum Çalışması

Bilişim Teknolojileri Dergisi

https://doi.org/10.17671/gazibtd.1026142

Türkçe Haber Metinlerinin Konvolüsyonel Sinir Ağları ve Word2Vec Kullanılarak Sınıflandırılması

Bilişim Teknolojileri Dergisi

https://doi.org/10.17671/gazibtd.457917

Türkiye'de Emsal Kararların Makine Öğrenmesi Algoritmaları ile Sınıflandırılması

Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji

https://doi.org/10.29109/gujsc.1668535