Co-occurrence Weight Selection for Word Embeddings to Enhance Test Performance

Aykut Koç; Veysel Yücesoy

doi:10.17482/uumfd.318615

Araştırma Makalesi

Co-occurrence Weight Selection for Word Embeddings to Enhance Test Performance

Yıl 2018, Cilt: 23 Sayı: 1, 31 - 40, 05.04.2018

Aykut Koç Veysel Yücesoy

https://doi.org/10.17482/uumfd.318615

Öz

This study revisits
the problem of maximizing the performance of mathematical word representations
for a given task. It is aimed to improve performance in analogy and similarity
tasks by suggesting innovative weights instead of the counting weights used
conventionally in counting-based methods of generating word representations
(adding the statistics of word co-occurrences to the account). The language of
study was selected as Turkish. The root structures of Turkish words were managed
during the compilation of corpus such that each word having a suffix was
considered as a new word. The performance of the proposed co-occurrence weights
are analyzed with respect to the varying parameter and the results are
presented within the paper.

Anahtar Kelimeler

Word embeddings, Natural language processing, Statistical linguistics

Kaynakça

Bahdanau, D., Cho, K. and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 1409.0473.
Bengio, Y., Ducharme, R., Vincent, P., and Jauvin C. (2003). A neural probabilistic language model. Journal of machine learning research, 1137 – 1155. doi: 10.1162/153244303322533223
Bojanowski, P., Grave, E., Joulin, A., ve Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.
Faruqui, M., Dodge, J. , Jauhar, S. K., Dyer, C., Hovy, E. ve Smith, N. A. (2014) Retrofitting word vectors to semantic lexicons, arXiv preprint arXiv:1411.4166. doi: 10.3115/v1/N15-1184
Firth, J. R., (1957). A synopsis of linguistic theory 1930-1955. In Studies in linguistic analysis, 1-32. Oxford:Blackwell.
Huth, A.G., de Heer, W.A., Griffiths, T.L., Theunissen, F.E. and Gallant, J.L. (2016) Natural speech reveals the semantic map that tile human cerebral cortex. Nature, vol. 532, no. 7600, 453 – 458. doi:10.1038/nature17637
Karpathy, A. and Fei-Fei, L. (2016). Deep visual-semantic alignments for generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39 (4), 664-676. doi: 10.1109/TPAMI.2016.2598339
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural netwroks. In Advances in neural information processing systems, 1097-1105. doi: 10.1145/3065386
Le, Q. V. ve Mikolov, T. (2014) Distributed representations of sentences and documents, ICML, vol. 14, 1188–1196.
Luong, T., Socher, R. ve Manning, C.D. (2013) Better word representations with recursive neural networks for morphology, CoNLL, 104–113.
Mikolov, T., Karafiat, M., Burget, L., Cernocky, J. and Khudanpur S. (2010) Recurrent neural network based language model, Interspeech, Vol 2, 3.
Mikolov, T., Chen, K., Corrado, G. ve Dean, J. (2013a) Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781.
Mikolov, T., Le, Q.V. and Sutskever, I. (2013b) Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.
Mikolov, T., Sutskever, I, Chen, K., Corrado, G.S. ve Dean J. (2013c) Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, 3111–3119.
Mnih, A., ve Hinton, G., (2007) Three new graphical models for statistical language modelling, Proceedings of the 24th International Conference on Machine Learning. ACM, 641–648. doi:10.1145/1273496.1273577
Pennington, J., Socher, R. ve Manning, C.D. (2014) Glove: Global vectors for word representation, Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. doi: 10.3115/v1/D14-1162
Ravichandran, D., Pantel, P. ve Hovy, E. (2005) Randomized algorithms and nlp: Using locality sensitive hash function for high speed noun clustering, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 622–629. doi: 10.3115/1219840.1219917
Salton, G., Wong, A., and Yang, C.S. (1975) A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620. doi: 10.1145/361219.361220
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., ve Potts C., (2013) Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 1631-1642.
Şenel, L.K., Yücesoy, V., Koç, A. and Çukur, T. (2017) Measuring cross-lingual semantic similarity across European languages. In International Conference on Telecommunications and Signal Processing (TSP), 359-363. doi: 10.1109/TSP.2017.8076005
Şenel, L. K., Yücesoy, V., Koç, A., Çukur, T. (2017b). Semantic similarity between Turkish and european languages using word embeddings. 25th Signal Processing and Communications Applications Conference, 1-4. doi: 10.1109/SIU.2017.7960365
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T. and Qin, B. (2014) Learning sentiment specific word embedding for twitter sentiment classification. ACL, 1555 – 1565. doi: 10.3115/v1/P14-1146
Yücesoy, V., Koç A. (2017). Effect of co-occurrence weighting to English word embeddings. 25th Signal Processing and Communications Applications Conference, 1-4. doi: 10.1109/SIU.2017.7960385

KELİME TEMSİLLERİ İÇİN TEST PERFORMANSINI GELİŞTİRMEYE YÖNELİK EŞDİZİMLİLİK AĞIRLIKLARININ SEÇİMİ

Yıl 2018, Cilt: 23 Sayı: 1, 31 - 40, 05.04.2018

Aykut Koç Veysel Yücesoy

https://doi.org/10.17482/uumfd.318615

Öz

Bu çalışma, matematiksel kelime temsillerinin belirli bir görev için
performanslarının en iyilenmesi problemini yeniden ele almaktadır. Sayma
tabanlı (kelimelerin eşdizimlilik istatistiklerini hesaba katan) kelime temsili
oluşturma yöntemlerinde klasik olarak kullanılan sayma ağırlıkları yerine
yenilikçi ağırlıklar önererek analoji ve benzerlik bulma görevlerinde
performans artışı sağlamak hedeflenmektedir. Çalışma dili olarak Türkçe
seçilmiş, derlem oluşturulurken Türkçe’ye has ek-kök yapıları ek alan her
kelime yeni bir kelime gibi kabul edilecek şekilde yorumlanmıştır. Önerilen
eşdizimlilik ağırlıklarının performansı değişen parametreye göre analiz
edilerek sonuçlar çalışma içerisinde paylaşılmıştır.

Anahtar Kelimeler

Kelime temsilleri, Doğal dil işleme, İstatistiksel dilbilimi

Kaynakça

Bahdanau, D., Cho, K. and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 1409.0473.
Bengio, Y., Ducharme, R., Vincent, P., and Jauvin C. (2003). A neural probabilistic language model. Journal of machine learning research, 1137 – 1155. doi: 10.1162/153244303322533223
Bojanowski, P., Grave, E., Joulin, A., ve Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.
Faruqui, M., Dodge, J. , Jauhar, S. K., Dyer, C., Hovy, E. ve Smith, N. A. (2014) Retrofitting word vectors to semantic lexicons, arXiv preprint arXiv:1411.4166. doi: 10.3115/v1/N15-1184
Firth, J. R., (1957). A synopsis of linguistic theory 1930-1955. In Studies in linguistic analysis, 1-32. Oxford:Blackwell.
Huth, A.G., de Heer, W.A., Griffiths, T.L., Theunissen, F.E. and Gallant, J.L. (2016) Natural speech reveals the semantic map that tile human cerebral cortex. Nature, vol. 532, no. 7600, 453 – 458. doi:10.1038/nature17637
Karpathy, A. and Fei-Fei, L. (2016). Deep visual-semantic alignments for generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39 (4), 664-676. doi: 10.1109/TPAMI.2016.2598339
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural netwroks. In Advances in neural information processing systems, 1097-1105. doi: 10.1145/3065386
Le, Q. V. ve Mikolov, T. (2014) Distributed representations of sentences and documents, ICML, vol. 14, 1188–1196.
Luong, T., Socher, R. ve Manning, C.D. (2013) Better word representations with recursive neural networks for morphology, CoNLL, 104–113.
Mikolov, T., Karafiat, M., Burget, L., Cernocky, J. and Khudanpur S. (2010) Recurrent neural network based language model, Interspeech, Vol 2, 3.
Mikolov, T., Chen, K., Corrado, G. ve Dean, J. (2013a) Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781.
Mikolov, T., Le, Q.V. and Sutskever, I. (2013b) Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.
Mikolov, T., Sutskever, I, Chen, K., Corrado, G.S. ve Dean J. (2013c) Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems, 3111–3119.
Mnih, A., ve Hinton, G., (2007) Three new graphical models for statistical language modelling, Proceedings of the 24th International Conference on Machine Learning. ACM, 641–648. doi:10.1145/1273496.1273577
Pennington, J., Socher, R. ve Manning, C.D. (2014) Glove: Global vectors for word representation, Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. doi: 10.3115/v1/D14-1162
Ravichandran, D., Pantel, P. ve Hovy, E. (2005) Randomized algorithms and nlp: Using locality sensitive hash function for high speed noun clustering, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 622–629. doi: 10.3115/1219840.1219917
Salton, G., Wong, A., and Yang, C.S. (1975) A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620. doi: 10.1145/361219.361220
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., ve Potts C., (2013) Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 1631-1642.
Şenel, L.K., Yücesoy, V., Koç, A. and Çukur, T. (2017) Measuring cross-lingual semantic similarity across European languages. In International Conference on Telecommunications and Signal Processing (TSP), 359-363. doi: 10.1109/TSP.2017.8076005
Şenel, L. K., Yücesoy, V., Koç, A., Çukur, T. (2017b). Semantic similarity between Turkish and european languages using word embeddings. 25th Signal Processing and Communications Applications Conference, 1-4. doi: 10.1109/SIU.2017.7960365
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T. and Qin, B. (2014) Learning sentiment specific word embedding for twitter sentiment classification. ACL, 1555 – 1565. doi: 10.3115/v1/P14-1146
Yücesoy, V., Koç A. (2017). Effect of co-occurrence weighting to English word embeddings. 25th Signal Processing and Communications Applications Conference, 1-4. doi: 10.1109/SIU.2017.7960385

Toplam 23 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Araştırma Makaleleri
Yazarlar	Aykut Koç Veysel Yücesoy
Yayımlanma Tarihi	5 Nisan 2018
Gönderilme Tarihi	5 Haziran 2017
Kabul Tarihi	7 Şubat 2018
Yayımlandığı Sayı	Yıl 2018 Cilt: 23 Sayı: 1

Kaynak Göster

APA	Koç, A., & Yücesoy, V. (2018). Co-occurrence Weight Selection for Word Embeddings to Enhance Test Performance. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, 23(1), 31-40. https://doi.org/10.17482/uumfd.318615
AMA	Koç A, Yücesoy V. Co-occurrence Weight Selection for Word Embeddings to Enhance Test Performance. UUJFE. Nisan 2018;23(1):31-40. doi:10.17482/uumfd.318615
Chicago	Koç, Aykut, ve Veysel Yücesoy. “Co-Occurrence Weight Selection for Word Embeddings to Enhance Test Performance”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 23, sy. 1 (Nisan 2018): 31-40. https://doi.org/10.17482/uumfd.318615.
EndNote	Koç A, Yücesoy V (01 Nisan 2018) Co-occurrence Weight Selection for Word Embeddings to Enhance Test Performance. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 23 1 31–40.
IEEE	A. Koç ve V. Yücesoy, “Co-occurrence Weight Selection for Word Embeddings to Enhance Test Performance”, UUJFE, c. 23, sy. 1, ss. 31–40, 2018, doi: 10.17482/uumfd.318615.
ISNAD	Koç, Aykut - Yücesoy, Veysel. “Co-Occurrence Weight Selection for Word Embeddings to Enhance Test Performance”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 23/1 (Nisan 2018), 31-40. https://doi.org/10.17482/uumfd.318615.
JAMA	Koç A, Yücesoy V. Co-occurrence Weight Selection for Word Embeddings to Enhance Test Performance. UUJFE. 2018;23:31–40.
MLA	Koç, Aykut ve Veysel Yücesoy. “Co-Occurrence Weight Selection for Word Embeddings to Enhance Test Performance”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, c. 23, sy. 1, 2018, ss. 31-40, doi:10.17482/uumfd.318615.
Vancouver	Koç A, Yücesoy V. Co-occurrence Weight Selection for Word Embeddings to Enhance Test Performance. UUJFE. 2018;23(1):31-40.

Makale Dosyaları

Tam Metin

DUYURU:

30.03.2021- Nisan 2021 (26/1) sayımızdan itibaren TR-Dizin yeni kuralları gereği, dergimizde basılacak makalelerde, ilk gönderim aşamasında Telif Hakkı Formu yanısıra, Çıkar Çatışması Bildirim Formu ve Yazar Katkısı Bildirim Formu da tüm yazarlarca imzalanarak gönderilmelidir. Yayınlanacak makalelerde de makale metni içinde "Çıkar Çatışması" ve "Yazar Katkısı" bölümleri yer alacaktır. İlk gönderim aşamasında doldurulması gereken yeni formlara "Yazım Kuralları" ve "Makale Gönderim Süreci" sayfalarımızdan ulaşılabilir. (Değerlendirme süreci bu tarihten önce tamamlanıp basımı bekleyen makalelerin yanısıra değerlendirme süreci devam eden makaleler için, yazarlar tarafından ilgili formlar doldurularak sisteme yüklenmelidir). Makale şablonları da, bu değişiklik doğrultusunda güncellenmiştir. Tüm yazarlarımıza önemle duyurulur.

Bursa Uludağ Üniversitesi, Mühendislik Fakültesi Dekanlığı, Görükle Kampüsü, Nilüfer, 16059 Bursa. Tel: (224) 294 1907, Faks: (224) 294 1903, e-posta: mmfd@uludag.edu.tr