Research Article
BibTex RIS Cite

TARİH VE COĞRAFYA ALANINDAKİ TÜRKÇE DERS METİNLERİ İÇİN BİR VARLIK İSMİ TANIMA MODELİ

Year 2019, Volume: 7 Issue: 3, 539 - 551, 15.09.2019
https://doi.org/10.21923/jesd.448251

Abstract

Varlık
ismi tanıma; doğal dil işleme ve metin madenciliği alanlarının kapsamında yer
alan bir bilgi çıkarımı görevidir. Kapsam ve kullanılan metotlar açısından,
çalışmalar arasında farklılıklar görülse de temel olarak, bir metin
içerisindeki kişi, yer, kurum-kuruluş vb. belirten ifadelerin doğru şekilde
tespit edilmesini hedefler. Bu çalışmada, Türkçe yazılmış ders metinleri (tarih
ve coğrafya alanlarında) için bir varlık ismi tanıma yapısı geliştirilmiştir.
Tek başına ele aldığımızda bu yapı, bir bilgi çıkarımı görevi doğrultusunda
özelleştirilmiş bir projedir. Bunun yanı sıra çalışmanın eğitimsel bir değeri
de vardır; çünkü sistemden beklenen sonuç, verilen ders metninin içeriğinden
anlamlı kelime ya da kelime grupları bulunmasıdır ki; bu da farklı dersler ya
da ders konuları için terimler sözlüğü yapıları oluşturmak için kullanılabilir.
Oluşturulan sözlüklerin, bir ders metninin içeriğindeki soru değeri
taşıyabilecek ifadelerin tespitine ve sınav hazırlama sürecine yardımcı olması
hedeflenmektedir. Bu dokümanda, varlık ismi tanıma görevi ve görevin kapsamı
hakkında genel bilgi verilmiş; alanda yapılmış önceki çalışmalardan
bahsedilmiş; bu çalışma doğrultusunda geliştirilen sistem tanıtılmış; sistemin
başarısı, yapılan deney sonuçları üzerinden değerlendirilmiş ve
geliştirme-iyileştirme olanakları hakkında yorumlar paylaşılmıştır.

References

  • Alfonseca, E., Manandhar S. (2002). “An unsupervised method for general named entity recognition and automated concept discovery”. In 1st International Conference on General WordNet.
  • Cucerzan, S., Yarowsky, D. (1999). “Language independent named entity recognition combining morphological and contextual evidence”. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. New Brunswick, NJ: Association for Computational Linguistics.
  • Ertopçu, B., Kanburoğlu, A., Topsakal, O., Açıkgöz, O., Gürkan, A., Özenç, B., Çam, İ., Avar, B., Ercan, G., Yıldız, O. (2017). “A new approach for named entity recognition”. In: International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, 2017.
  • Grishman, A., Sundheim, B. (1996). “Message Understanding Conference-6: a brief history”. In Proceedings of the 16th conference on Computational linguistics - Volume 1 (COLING '96), Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, 466-471.
  • Jurafsky, D., Martin, J.H. (2009). “Speech and language processing (2nd Edition)”. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
  • Küçük, D., Jacquet, G., Steinberger, R. (2014). “Named entity recognition on Turkish tweets”. In: Language Resources and Evaluation Conference, 2014.
  • Küçük, D., Küçük, D., Arıcı, N. (2016). “A named entity recognition dataset for Turkish”. In: 24th Signal Processing and Communications Applications Conference (SIU), Zonguldak, Turkey, 2016.
  • Küçük, D., Yazıcı, A. (2009). “Named entity recognition experiments on Turkish texts”. In Proceedings of the 8th International Conference on Flexible Query Answering Systems, FQAS ’09, pages 524–535, Berlin, Heidelberg. Springer-Verlag.
  • Küçük, D., Yazıcı, A. (2009). “Rule-based named entity recognition from Turkish texts”. In Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications, Trabzon, Turkey. pages 456–460.
  • Küçük, D., Yazıcı, A. (2012). “A hybrid named entity recognizer for Turkish with applications to different text genres”. In: Gelenbe E., Lent R., Sakellari G., Sacan A., Toroslu H., Yazici A. (eds) Computer and Information Sciences. Lecture Notes in Electrical Engineering, vol 62. Springer, Dordrecht.
  • Sang, E., Meulder F. (2003). “Introduction to the CoNLL-2003 shared task: language-independent named entity recognition”. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (CONLL '03), Vol. 4. Association for Computational Linguistics, Stroudsburg, PA, USA, 142-147
  • Şeker, G. A., Eryiğit, G. (2012). “Initial explorations on using CRFs for Turkish named entity recognition”. In Proceedings of COLING 2012, Mumbai, India, 8-15 December.
  • Şeker, G., Eryiğit, G. (2016). “State of the art in Turkish named entity recognition”. May 2018. Retrieved from https://pdfs.semanticscholar.org/7e7f/ed9d21a3e3a36c4eb3c7df1ee8116e8ec2ce.pdf
  • Tatar, S., Çiçekli, İ. (2011). “Automatic rule learning exploiting morphological features for named entity recognition in Turkish”. Journal of Information Science, 37 (2), April 2011, 137-151.
  • Tür, G., Hakkani-Tür G., Oflazer K. (2003). “A statistical information extraction system for Turkish”. Natural Language Engineering, vol. 9 (2), pp. 181-210.
  • Wentland, W., Knopp, J., Silberer, C., Hartung, M. (2008). “Building a multilingual lexical resource for named entity disambiguation, translation and transliteration”. in Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, 26 May–1 June 2008.

A NAMED ENTITY RECOGNITION MODEL FOR TURKISH LECTURE NOTES IN HISTORY AND GEOGRAPHY DOMAINS

Year 2019, Volume: 7 Issue: 3, 539 - 551, 15.09.2019
https://doi.org/10.21923/jesd.448251

Abstract

Named entity recognition
(NER) is an information extraction (IE) task that is in the scope of natural
language processing (NLP) and text mining. Its extent and methods may differ
between studies, but basically, it aims to detect expressions that indicates a
person, location, organization etc. In this study, a NER structure is developed
for Turkish lecture notes (for history and geography courses). Separately, this
structure is a project that is specialized for an information extraction task.
Besides, it also has an educational value, as the projected outcome from its
execution is meaningful words or word groups from the content of input lecture
notes, which can be used to construct glossary of terms structures for
individual courses or course subjects. With these glossary of terms structures,
it is aimed to detect expressions in the content of a lecture note that can be
used for questions and support a test preparation process. In this document,
general information about NER task and its scope is given; previous studies on
the field are mentioned; the system developed in line with this study is
introduced; success of the system is evaluated through experiment results and
some thoughts for enhancement are shared.

References

  • Alfonseca, E., Manandhar S. (2002). “An unsupervised method for general named entity recognition and automated concept discovery”. In 1st International Conference on General WordNet.
  • Cucerzan, S., Yarowsky, D. (1999). “Language independent named entity recognition combining morphological and contextual evidence”. Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. New Brunswick, NJ: Association for Computational Linguistics.
  • Ertopçu, B., Kanburoğlu, A., Topsakal, O., Açıkgöz, O., Gürkan, A., Özenç, B., Çam, İ., Avar, B., Ercan, G., Yıldız, O. (2017). “A new approach for named entity recognition”. In: International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, 2017.
  • Grishman, A., Sundheim, B. (1996). “Message Understanding Conference-6: a brief history”. In Proceedings of the 16th conference on Computational linguistics - Volume 1 (COLING '96), Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, 466-471.
  • Jurafsky, D., Martin, J.H. (2009). “Speech and language processing (2nd Edition)”. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
  • Küçük, D., Jacquet, G., Steinberger, R. (2014). “Named entity recognition on Turkish tweets”. In: Language Resources and Evaluation Conference, 2014.
  • Küçük, D., Küçük, D., Arıcı, N. (2016). “A named entity recognition dataset for Turkish”. In: 24th Signal Processing and Communications Applications Conference (SIU), Zonguldak, Turkey, 2016.
  • Küçük, D., Yazıcı, A. (2009). “Named entity recognition experiments on Turkish texts”. In Proceedings of the 8th International Conference on Flexible Query Answering Systems, FQAS ’09, pages 524–535, Berlin, Heidelberg. Springer-Verlag.
  • Küçük, D., Yazıcı, A. (2009). “Rule-based named entity recognition from Turkish texts”. In Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications, Trabzon, Turkey. pages 456–460.
  • Küçük, D., Yazıcı, A. (2012). “A hybrid named entity recognizer for Turkish with applications to different text genres”. In: Gelenbe E., Lent R., Sakellari G., Sacan A., Toroslu H., Yazici A. (eds) Computer and Information Sciences. Lecture Notes in Electrical Engineering, vol 62. Springer, Dordrecht.
  • Sang, E., Meulder F. (2003). “Introduction to the CoNLL-2003 shared task: language-independent named entity recognition”. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (CONLL '03), Vol. 4. Association for Computational Linguistics, Stroudsburg, PA, USA, 142-147
  • Şeker, G. A., Eryiğit, G. (2012). “Initial explorations on using CRFs for Turkish named entity recognition”. In Proceedings of COLING 2012, Mumbai, India, 8-15 December.
  • Şeker, G., Eryiğit, G. (2016). “State of the art in Turkish named entity recognition”. May 2018. Retrieved from https://pdfs.semanticscholar.org/7e7f/ed9d21a3e3a36c4eb3c7df1ee8116e8ec2ce.pdf
  • Tatar, S., Çiçekli, İ. (2011). “Automatic rule learning exploiting morphological features for named entity recognition in Turkish”. Journal of Information Science, 37 (2), April 2011, 137-151.
  • Tür, G., Hakkani-Tür G., Oflazer K. (2003). “A statistical information extraction system for Turkish”. Natural Language Engineering, vol. 9 (2), pp. 181-210.
  • Wentland, W., Knopp, J., Silberer, C., Hartung, M. (2008). “Building a multilingual lexical resource for named entity disambiguation, translation and transliteration”. in Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, 26 May–1 June 2008.
There are 16 citations in total.

Details

Primary Language English
Subjects Computer Software
Journal Section Araştırma Articlessi \ Research Articles
Authors

Önder Can Sarı 0000-0003-4226-9633

Özlem Aktaş 0000-0001-6415-0698

Publication Date September 15, 2019
Submission Date July 26, 2018
Acceptance Date April 3, 2019
Published in Issue Year 2019 Volume: 7 Issue: 3

Cite

APA Sarı, Ö. C., & Aktaş, Ö. (2019). A NAMED ENTITY RECOGNITION MODEL FOR TURKISH LECTURE NOTES IN HISTORY AND GEOGRAPHY DOMAINS. Mühendislik Bilimleri Ve Tasarım Dergisi, 7(3), 539-551. https://doi.org/10.21923/jesd.448251