Olasılıksal Yöntemler ile Türkçe Metinlerin Anlamsal Benzerliğinin Belirlenmesi

Engin Yıldıztepe; Volkan Uzun

doi:10.33484/sinopfbd.350445

Araştırma Makalesi

Determination of the Semantic Similarity of Turkish Texts Using Probabilistic Methods

Yıl 2018, Cilt: 3 Sayı: 2, 66 - 78, 28.12.2018

Engin Yıldıztepe , Volkan Uzun

https://doi.org/10.33484/sinopfbd.350445

Cited By: 3

https://izlik.org/JA28SJ59TY

Öz

Text mining is the
process to deriving useful information from unstructured text data. During this
process, text mining uses statistical and mathematical methods. Major text
mining tasks include text categorization, text clustering, concept extraction,
document summarization, semantic similarity and author identification. In this study,
semantic similarity issues have been examined. Semantic similarity analysis
aims to determine semantic similarity between texts. Probabilistic latent
semantic analysis and latent Dirichlet allocation are probabilistic methods to
determine semantic similarity between texts. In this study, semantic analysis
using probabilistic latent semantic analysis and latent Dirichlet allocation
methods is examined. Also, an application which is conducted to analyze
semantic similarity and classify Turkish textual data chosen from different
news agencies is discussed. R statistical programming language and Matlab are
used in the application.

Anahtar Kelimeler

Kaynakça

Hoffman T, 2015. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42: 177-196.
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R, 1990. Indexing by latent semantic analysis. Journal of the American society for information science, 41(6): 391-407.
Blei DM, Ng AY, Jordan MI, 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022.
Dempster AP, Laird NM, Rubin, DB, 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: 1-38.
Zemberek NLP, http://zemberek-web.appspot.com/ [erişim 03/2014]
Hornik K, Grün B, 2011. topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40 (13): 1-30.
Porter MF, 1980. An algorithm for suffix stripping. Program, 14 (3): 130-137

Olasılıksal Yöntemler ile Türkçe Metinlerin Anlamsal Benzerliğinin Belirlenmesi

Yıl 2018, Cilt: 3 Sayı: 2, 66 - 78, 28.12.2018

Engin Yıldıztepe , Volkan Uzun

https://doi.org/10.33484/sinopfbd.350445

Cited By: 3

https://izlik.org/JA28SJ59TY

Öz

Metin madenciliğinde, yapısal olmayan metin verilerinden matematiksel ve
istatistiksel yöntemler ile anlamlı bilgiler çıkartmak amaçlanır. Metin sınıflama, kümeleme, görüş belirleme,
özetleme, anlamsal benzerlik bulma ve yazar tanıma, başlıca metin madenciliği
çalışma alanlarıdır. Bu çalışmanın konusu olan anlamsal benzerlik analizi,
metinler arasındaki anlamsal yakınlığı belirlemeye çalışır. Olasılıksal gizli anlam analizi ve gizli
Dirichlet ataması, metinler arasındaki anlamsal benzerliğin belirlenmesinde
kullanılan olasılıksal yöntemlerdir. Bu
çalışmada olasılıksal gizli anlam analizi ve gizli Dirichlet ataması ile
anlamsal benzerlik konusu incelenmiş ve farklı haber ajanslarından seçilen
Türkçe metinleri anlamsal benzerliklerine göre sınıflamak için yapılan uygulama
tartışılmıştır. Uygulamada R istatistiksel programlama dili ve Matlab kullanılmıştır.

Anahtar Kelimeler

Anlamsal benzerlik , Gizli anlam analizi , Gizli Dirichlet ataması , Metin madenciliği

Kaynakça

Hoffman T, 2015. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42: 177-196.
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R, 1990. Indexing by latent semantic analysis. Journal of the American society for information science, 41(6): 391-407.
Blei DM, Ng AY, Jordan MI, 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022.
Dempster AP, Laird NM, Rubin, DB, 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: 1-38.
Zemberek NLP, http://zemberek-web.appspot.com/ [erişim 03/2014]
Hornik K, Grün B, 2011. topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40 (13): 1-30.
Porter MF, 1980. An algorithm for suffix stripping. Program, 14 (3): 130-137

Toplam 7 adet kaynakça vardır.

Ayrıntılar

Konular	Mühendislik
Bölüm	Araştırma Makalesi
Yazarlar	Engin Yıldıztepe Volkan Uzun Bu kişi benim
Gönderilme Tarihi	10 Kasım 2017
Yayımlanma Tarihi	28 Aralık 2018
DOI	https://doi.org/10.33484/sinopfbd.350445
IZ	https://izlik.org/JA28SJ59TY
Yayımlandığı Sayı	Yıl 2018 Cilt: 3 Sayı: 2

Kaynak Göster

APA	Yıldıztepe, E., & Uzun, V. (2018). Olasılıksal Yöntemler ile Türkçe Metinlerin Anlamsal Benzerliğinin Belirlenmesi. Sinop Üniversitesi Fen Bilimleri Dergisi, 3(2), 66-78. https://doi.org/10.33484/sinopfbd.350445

Sinop Üniversitesi Fen Bilimleri Dergisi

Determination of the Semantic Similarity of Turkish Texts Using Probabilistic Methods

Öz

Anahtar Kelimeler

Kaynakça

Olasılıksal Yöntemler ile Türkçe Metinlerin Anlamsal Benzerliğinin Belirlenmesi

Öz

Anahtar Kelimeler

Kaynakça

Ayrıntılar

Kaynak Göster

Cited By

Sosyal Medya Platformu Üzerinde Gizli Anlam Analizi

European Journal of Science and Technology

https://doi.org/10.31590/ejosat.590521

https://doi.org/

A Turkish Dataset and BERTurk-Contrastive Model for Semantic Textual Similarity

Journal of Information Systems and Telecommunication (JIST)

https://doi.org/10.61186/jist.48127.13.49.24