Araştırma Makalesi
BibTex RIS Kaynak Göster

A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL

Yıl 2010, Cilt: 11 Sayı: 2, 163 - 172, 29.11.2010

Öz

We used Lemur Toolkit, an open source toolkit designed for Information Retrieval research, for our automated indexing and retrieval experiments on a TREC-like test collection for Turkish language. We investigate effectiveness of three retrieval models Lemur supports, especially Language modeling approach to Information Retrieval, combined with language specific preprocessing techniques. Our experiments show that language specific preprocessing significantly improves retrieval performance for all retrieval models. Also Language Modeling approach is the best performing retrieval model when language specific preprocessing applied. 

Kaynakça

  • Altingovde, I.S., Ozcan, R., Ocalan, H.C., Can, F. and Ulusoy, O. (2007). Large-scale cluster-based retrieval experiments on Turkish texts. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 30th annual
  • And, P.O., Ogilvie, P. and Callan, J. (2002). Experiments Using the Lemur Toolkit. Paper presented at the in Proceedings of the Tenth Text Retrieval Conference (TREC-10).
  • Arslan, A. and Yilmazel, O. (2008). 19-22 Oct. 2008). A comparison of Relational Databases and information retrieval libraries on Turkish text retrieval. Paper presented at the Natural Language Processing and Knowledge Engineering, 2008. Conference on. International
  • Buckley, C. and Voorhees, E.M. (2004). Retrieval evaluation with incomplete information. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 27th annual
  • Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., and Vursavas, O.M. (2008). Information retrieval on Turkish texts. J. Am. Soc. Inf. Sci. Technol. 59(3), 407- 421.
  • Cover, T.M. and Thomas, J.A. (1991). Elemets of information theory. Wiley-Interscience.
  • Eryigit, G. and Adali, E. (2004). An Affix Stripping Morphological Analyzer For Turkish. Paper presented at the IASTED International Conference on Artificial Intelligence and Applications, Innsbruck, Austria.
  • Harman, D. (1993). Overview of the first TREC conference. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 16th annual
  • Jones, K.S. (1988). A statistical interpretation of term specificity and its application in retrieval Document retrieval systems (pp. 132-142): Taylor Graham Publishing.
  • Manning, C., Raghavan, P. and Schutze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
  • Ponte, J.M. and Croft, W.B. (1998). A language modeling approach to information retrieval. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 21st annual
  • Robertson, S., Walker, S., Hancock-Beaulieu, M., Gull, A. and Lau, M. (1992). Okapi at TREC-3. Paper presented at the Text REtrieval Conference.
  • Salton, G., Wong, A. and Yang, C.S. (1975). A vector space model for automatic indexing. Commun. ACM 18(11), 613- 620.
  • Walker, S., Robertson, S.E., Boughanem, M., Jones, G.J.F. and Jones, S. (1998). Okapi at TREC-6 - Automatic ad hoc, VLC, routing, filtering and QSDR.
  • Zhai, C. Notes on the Lemur TFIDF model, from 1.0/tfidf.ps
  • Zhai, C. and Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. Paper presented at the Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval.

TÜRKÇE METİN GERİ GETİRIMİNDE DİL MODELLEME YAKLAŞIMI

Yıl 2010, Cilt: 11 Sayı: 2, 163 - 172, 29.11.2010

Öz

Bu çalışmada, bilgi erişimi araştırması için tasarlanmış açık kaynak kodlu bir araç olan Lemur kullanılarak, Türkçe dili için hazırlanmış TREC benzeri bir derlem üzerinde otomatik indeksleme ve geri getirme deneyleri gerçekleştirildi. Bilgi erişiminde dil modelleme yaklaşımı başta olmak üzere Lemur tarafından desteklenen üç geri getirme modeli ve dile özgü ön işleme teknikleri araştırıldı. Deneylerimiz, dile özgü ön işleme tekniklerinin tüm geri getirim modelleri için geri getirme performansını artırdığını gösterdi. Ayrıca Türkçe dili için en iyi performans dil modelleme yaklaşımından elde edildi.

Kaynakça

  • Altingovde, I.S., Ozcan, R., Ocalan, H.C., Can, F. and Ulusoy, O. (2007). Large-scale cluster-based retrieval experiments on Turkish texts. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 30th annual
  • And, P.O., Ogilvie, P. and Callan, J. (2002). Experiments Using the Lemur Toolkit. Paper presented at the in Proceedings of the Tenth Text Retrieval Conference (TREC-10).
  • Arslan, A. and Yilmazel, O. (2008). 19-22 Oct. 2008). A comparison of Relational Databases and information retrieval libraries on Turkish text retrieval. Paper presented at the Natural Language Processing and Knowledge Engineering, 2008. Conference on. International
  • Buckley, C. and Voorhees, E.M. (2004). Retrieval evaluation with incomplete information. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 27th annual
  • Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., and Vursavas, O.M. (2008). Information retrieval on Turkish texts. J. Am. Soc. Inf. Sci. Technol. 59(3), 407- 421.
  • Cover, T.M. and Thomas, J.A. (1991). Elemets of information theory. Wiley-Interscience.
  • Eryigit, G. and Adali, E. (2004). An Affix Stripping Morphological Analyzer For Turkish. Paper presented at the IASTED International Conference on Artificial Intelligence and Applications, Innsbruck, Austria.
  • Harman, D. (1993). Overview of the first TREC conference. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 16th annual
  • Jones, K.S. (1988). A statistical interpretation of term specificity and its application in retrieval Document retrieval systems (pp. 132-142): Taylor Graham Publishing.
  • Manning, C., Raghavan, P. and Schutze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
  • Ponte, J.M. and Croft, W.B. (1998). A language modeling approach to information retrieval. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 21st annual
  • Robertson, S., Walker, S., Hancock-Beaulieu, M., Gull, A. and Lau, M. (1992). Okapi at TREC-3. Paper presented at the Text REtrieval Conference.
  • Salton, G., Wong, A. and Yang, C.S. (1975). A vector space model for automatic indexing. Commun. ACM 18(11), 613- 620.
  • Walker, S., Robertson, S.E., Boughanem, M., Jones, G.J.F. and Jones, S. (1998). Okapi at TREC-6 - Automatic ad hoc, VLC, routing, filtering and QSDR.
  • Zhai, C. Notes on the Lemur TFIDF model, from 1.0/tfidf.ps
  • Zhai, C. and Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. Paper presented at the Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval.
Toplam 16 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Mühendislik
Bölüm Araştırma Makalesi
Yazarlar

Ozgur Yilmazel

Yayımlanma Tarihi 29 Kasım 2010
Yayımlandığı Sayı Yıl 2010 Cilt: 11 Sayı: 2

Kaynak Göster

APA Yilmazel, O. (2010). A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering, 11(2), 163-172.
AMA Yilmazel O. A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. AUBTD-A. Aralık 2010;11(2):163-172.
Chicago Yilmazel, Ozgur. “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 11, sy. 2 (Aralık 2010): 163-72.
EndNote Yilmazel O (01 Aralık 2010) A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 11 2 163–172.
IEEE O. Yilmazel, “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”, AUBTD-A, c. 11, sy. 2, ss. 163–172, 2010.
ISNAD Yilmazel, Ozgur. “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering 11/2 (Aralık 2010), 163-172.
JAMA Yilmazel O. A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. AUBTD-A. 2010;11:163–172.
MLA Yilmazel, Ozgur. “A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL”. Anadolu University Journal of Science and Technology A - Applied Sciences and Engineering, c. 11, sy. 2, 2010, ss. 163-72.
Vancouver Yilmazel O. A LANGUAGE MODELING APPROACH TO TURKISH TEXT RETRIEVAL. AUBTD-A. 2010;11(2):163-72.