Research Article
BibTex RIS Cite

İki Düzeyli Kazak Morfolojisi

Year 2021, Volume: 5 Issue: 1, 79 - 98, 29.06.2021

Abstract

Bu çalışmada Çağdaş Kazakça’nın iki düzeyli kapsamlı bir morfolojisini sunulmuştur. Çalışma Nuve Çatısı üzerinde gerçeklenmiş ve belirsizlik giderme veri seti ile test edilmiştir. Çalışmamız benzerlerinden bir kaç yönden farklılık göstermektedir:(i) Çalışmamız hem yapım hem çekim morfolojisini benzerlerinden daha geniş olarak ele almaktadır. (ii) İki-düzeyli yazım kuralları, ek dizilim kuralları, yaklaşık 24 bin kelimelik sözlük ve yaklaşık 150 adetlik ek sözlüğünden oluşan gerçeklememiz açık kaynak kodlu olarak paylaşıma açılmıştır. Üçüncü taraflarca indirilebilir, gözden geçirilebilir ve test edilebilir. (iii) Gerçeklememiz var olan kuralların değiştirilmesi veya yenilerinin eklenmesiyle kolayca genişletilebilir bir yapıdadır. Programlama gerektirmez. (iv) Nuve Çatısı çalışma grubumuz tarafından geliştirildiği için ortaya çıkan yeni problemleri kolay ve hızlı bir şekilde çözebilmekteyiz. (v) Gerçeklememiz ayrı yazılan ekler, iki sembolden meydana gelen harfler gibi durumları kolayca ele alabilmektedir. (vi) Nuve Türkçenin iki düzeyli morfolojisini de içermektedir. Bu sayede kelime hazinesi, kelime yapısı ve cümle yapısı yönlerinden büyük benzerlikler içeren Türki dillerle Türkçe arasında morfoloji tabanlı makina çeviri yapılabilir.

References

  • Abdukerim, G., Tursun, E., Yang, Y., & Li, X. (2019). Uyghur morphological analysis using joint conditional random fields: Based on small scaled corpus. Discrete & Continuous Dynamical Systems-S, 12(4&5), 823.
  • Ablimit, M., Kawahara, T., Pattar, A., & Hamdulla, A. (2016). Stem-affix based Uyghur morphological analyzer. International Journal of Future Generation Communication and Networking, 9(2), 59-72.
  • Alam, Y. S. (1983). A two-level morphological analysis of Japanese. In Texas Linguistic Forum, 22, 229-252.
  • Altintas, K., & Cicekli, I. (2001). A morphological analyser for Crimean Tatar. Proceedings of the 10th Turkish Symposium on Artificial Intelligence and Neural Networks(TAINN’2001), North Cyprus, 180-189.
  • Antworth, E. L. (1990). PC-KIMMO: A two-level processor for morphological analysis. Summer Institute of Linguistics, International Academic Bookstore, Dallas, Texas.
  • Bekmanova, G., Sharipbay, A., Altenbek, G., Adali, E., Zhetkenbay, L., Kamanur, U., & Zulkhazhav, A. (2017). A uniform morphological analyzer for the Kazakh and Turkish languages. Proceedings of the Sixth International Conference on Analysis of Images, Social Networks and Texts, Moscow, Russia, 20–30.
  • Biray N., Ayan E., Ercilasun G. K., (2015). Çağdaş Kazak Türkçesi Ses-Şekil- Cümle Bilgisi- Metinler (2nd ed.). Istanbul, Turkey: Bilge Kültür Sanat.
  • Eryiğit, G., & Adalı, E. (2004, February). An affix stripping morphological analyzer for Turkish. Proceedings of the IASTED International Conference Artificial Intelligence and Applications, Innsbruck, Austria, 299–304.
  • Gökgöz, E., Kurt, A., Kulamshaev, K., & Kara, M. (2011, May). Two-level Qazan Tatar morphology. Proceedings of the 1st International Conference on Foreign Language Teaching and Applied Linguistics (FLTAL’11), Sarajevo, Bosnia and Herzegovina, 428-432.
  • Görmez, Z., Ünlü B. S., Kurt, A., Kulamshaev, K., & Kara, M. (2011). An overview of two-level finite state Kyrgyz morphology. Proceedings of the 2. International Symposium on Computing in Science & Engineering (ISCSE)., Aydin, Turkey, 48-52.
  • Karttunen, L. (1983, December). KIMMO: a general morphological processor. In Texas Linguistic Forum, 22, 163-186.
  • Keskin, R. (2012). Two Level Uyghur Morphology and Uyghur Turkish Machine Translation. (Master’s Thesis). Fatih University the Graduate Institute of Sciences and Engineering, Istanbul.
  • Kessikbayeva, G., & Cicekli, I. (2014, June). Rule-based morphological analyzer of Kazakh language. In Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM, Baltimore, Maryland, 46-54.
  • Kessikbayeva, G., & Cicekli, I. (2016). A rule based morphological analyzer and a morphological disambiguator for Kazakh language. Linguistics and Literature Studies, 4(1), 96-104.
  • Kim, D. B., Lee, S. J., Choi, K. S., & Kim, G. C. (1994, August). A two-level morphological analysis of Korean. In Proceedings of the 15th Conference on Computational Linguistics, Vol 1, 535-539.
  • Koskenniemi, K. (1983, August). Two-level model for morphological analysis. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, Germany, 683-685.
  • Makazhanov, A., Sultangazina, A., Makhambetov, O., & Yessenbayev, Z. (2015). Syntactic annotation of Kazakh: following the universal dependencies guidelines. A report. In Proceedings of the International Conference Turkic Languages Processing- TurkLang-2015, Kazan, Tatarstan, 338-350.
  • Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Sabyrgaliyev, I., & Sharafudinov, A. (2014). Towards a data-driven morphological analysis of Kazakh language. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 7(2), 31-36.
  • Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Matkarimov, B., Sabyrgaliyev, I., & Sharafudinov, A. (2013, October). Assembling the Kazakh language corpus. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA,1022-1031.
  • Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and linguistic computing, 9(2), 137-148.
  • Orhun, M., Tantug, A. C., & Adali, E. (2009). Rule based analysis of the Uyghur nouns. International Journal on Asian Language Processing, 19 (1),33-43.
  • Shylov, M. (2010). Two level Turkmen morphology and a Turkmen Turkish machine translation, (Master’s Thesis). Fatih University the Graduate Institute of Sciences and Engineering, Istanbul.
  • Şanlı, T. (2018). Kırım Tatarcası’nın biçimbilimsel çözümlemesi ve Kırım Tatarcası-Türkçe biçimbilimsel makina çevirisiSistemi. (Master’s Thesis). Istanbul University Institute of Graduate Studies in Science and Engineering, Istanbul.
  • Tantuğ, A. C., Adalı, E., & Oflazer, K. (2006, August). Computer analysis of the Turkmen language morphology. Proceedings of the 5th International Conference on Natural Language Processing, Turku, Finland, 186-193.
  • Tyers, F. M., & Washington, J. (2015). Towards a free/open-source Universal Dependency Treebank for Kazakh. In Proceedings of the International Conference Turkic Languages Processing, TurkLang-2015, Kazan, Tatarstan, 276-289 .
  • Washington, J., Salimzyanov, I., & Tyers, F. M. (2014, May). Finite-state morphological transducers for three Kypchak languages. Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC, Reykjavik, Iceland ,3378-3385.
  • Yiner, Z., Kurt, A., Kulamshaev, K., & Zafer, H. R. (2016, May). Kyrgyz orthography and morphotactics with implementation in NUVE. Proceedings of International Conference on Engineering and Natural Sciences, Sarajevo, Bosnia and Herzegovina, 1650-1658.
  • Zafer, H. R., Tilki, B., Kurt, A., & Kara, M. (2011, May). Two-level description of Kazakh morphology. Proceedings of the 1st International Conference on Foreign Language Teaching and Applied Linguistics (FLTAL’11), Sarajevo, Bosnia and Herzegovina, 560-564.
  • Zafer, H. R., “Nuve: A natural language processing library for Turkish in C#”. [Online]. Available: https://github.com/hrzafer/nuve. (05.12.2020).

Two Level Kazakh Morphology

Year 2021, Volume: 5 Issue: 1, 79 - 98, 29.06.2021

Abstract

We present a comprehensive two level morphological analysis of contemporary Kazakh with implementation and a disambiguation test data set on the Nuve Framework. Our study differs from the similar studies in a number of ways: (i) Our study covers both derivational and inflectional morphology to a greater extend (ii) Our implementation consisting of orthographic rules, morphotactics, a root lexicon of roughly 24 thousand roots, a lexicon of roughly 150 suffixes is open source which can be downloaded, reviewed and tested. (ii) Roughly 10 thousand manually disambiguated parses are available as a morphological disambiguation data set. (iii) It is easily extensible meaning it can be modified or extended with new rules without any programming. (iv) we are able to tackle emerging problems quickly and easily since Nuve is maintained by our study group. (v) Our implementation can handle separately written morphemes or digraphs etc. directly. (vi) We also have a Turkish morphological parser/generator in Nuve for morphology based machine translation between Turkish and other Turkic languages since these closely related languages have a lot in common from lexical, morphological, and syntactic aspects.

References

  • Abdukerim, G., Tursun, E., Yang, Y., & Li, X. (2019). Uyghur morphological analysis using joint conditional random fields: Based on small scaled corpus. Discrete & Continuous Dynamical Systems-S, 12(4&5), 823.
  • Ablimit, M., Kawahara, T., Pattar, A., & Hamdulla, A. (2016). Stem-affix based Uyghur morphological analyzer. International Journal of Future Generation Communication and Networking, 9(2), 59-72.
  • Alam, Y. S. (1983). A two-level morphological analysis of Japanese. In Texas Linguistic Forum, 22, 229-252.
  • Altintas, K., & Cicekli, I. (2001). A morphological analyser for Crimean Tatar. Proceedings of the 10th Turkish Symposium on Artificial Intelligence and Neural Networks(TAINN’2001), North Cyprus, 180-189.
  • Antworth, E. L. (1990). PC-KIMMO: A two-level processor for morphological analysis. Summer Institute of Linguistics, International Academic Bookstore, Dallas, Texas.
  • Bekmanova, G., Sharipbay, A., Altenbek, G., Adali, E., Zhetkenbay, L., Kamanur, U., & Zulkhazhav, A. (2017). A uniform morphological analyzer for the Kazakh and Turkish languages. Proceedings of the Sixth International Conference on Analysis of Images, Social Networks and Texts, Moscow, Russia, 20–30.
  • Biray N., Ayan E., Ercilasun G. K., (2015). Çağdaş Kazak Türkçesi Ses-Şekil- Cümle Bilgisi- Metinler (2nd ed.). Istanbul, Turkey: Bilge Kültür Sanat.
  • Eryiğit, G., & Adalı, E. (2004, February). An affix stripping morphological analyzer for Turkish. Proceedings of the IASTED International Conference Artificial Intelligence and Applications, Innsbruck, Austria, 299–304.
  • Gökgöz, E., Kurt, A., Kulamshaev, K., & Kara, M. (2011, May). Two-level Qazan Tatar morphology. Proceedings of the 1st International Conference on Foreign Language Teaching and Applied Linguistics (FLTAL’11), Sarajevo, Bosnia and Herzegovina, 428-432.
  • Görmez, Z., Ünlü B. S., Kurt, A., Kulamshaev, K., & Kara, M. (2011). An overview of two-level finite state Kyrgyz morphology. Proceedings of the 2. International Symposium on Computing in Science & Engineering (ISCSE)., Aydin, Turkey, 48-52.
  • Karttunen, L. (1983, December). KIMMO: a general morphological processor. In Texas Linguistic Forum, 22, 163-186.
  • Keskin, R. (2012). Two Level Uyghur Morphology and Uyghur Turkish Machine Translation. (Master’s Thesis). Fatih University the Graduate Institute of Sciences and Engineering, Istanbul.
  • Kessikbayeva, G., & Cicekli, I. (2014, June). Rule-based morphological analyzer of Kazakh language. In Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM, Baltimore, Maryland, 46-54.
  • Kessikbayeva, G., & Cicekli, I. (2016). A rule based morphological analyzer and a morphological disambiguator for Kazakh language. Linguistics and Literature Studies, 4(1), 96-104.
  • Kim, D. B., Lee, S. J., Choi, K. S., & Kim, G. C. (1994, August). A two-level morphological analysis of Korean. In Proceedings of the 15th Conference on Computational Linguistics, Vol 1, 535-539.
  • Koskenniemi, K. (1983, August). Two-level model for morphological analysis. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, Germany, 683-685.
  • Makazhanov, A., Sultangazina, A., Makhambetov, O., & Yessenbayev, Z. (2015). Syntactic annotation of Kazakh: following the universal dependencies guidelines. A report. In Proceedings of the International Conference Turkic Languages Processing- TurkLang-2015, Kazan, Tatarstan, 338-350.
  • Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Sabyrgaliyev, I., & Sharafudinov, A. (2014). Towards a data-driven morphological analysis of Kazakh language. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 7(2), 31-36.
  • Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Matkarimov, B., Sabyrgaliyev, I., & Sharafudinov, A. (2013, October). Assembling the Kazakh language corpus. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA,1022-1031.
  • Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and linguistic computing, 9(2), 137-148.
  • Orhun, M., Tantug, A. C., & Adali, E. (2009). Rule based analysis of the Uyghur nouns. International Journal on Asian Language Processing, 19 (1),33-43.
  • Shylov, M. (2010). Two level Turkmen morphology and a Turkmen Turkish machine translation, (Master’s Thesis). Fatih University the Graduate Institute of Sciences and Engineering, Istanbul.
  • Şanlı, T. (2018). Kırım Tatarcası’nın biçimbilimsel çözümlemesi ve Kırım Tatarcası-Türkçe biçimbilimsel makina çevirisiSistemi. (Master’s Thesis). Istanbul University Institute of Graduate Studies in Science and Engineering, Istanbul.
  • Tantuğ, A. C., Adalı, E., & Oflazer, K. (2006, August). Computer analysis of the Turkmen language morphology. Proceedings of the 5th International Conference on Natural Language Processing, Turku, Finland, 186-193.
  • Tyers, F. M., & Washington, J. (2015). Towards a free/open-source Universal Dependency Treebank for Kazakh. In Proceedings of the International Conference Turkic Languages Processing, TurkLang-2015, Kazan, Tatarstan, 276-289 .
  • Washington, J., Salimzyanov, I., & Tyers, F. M. (2014, May). Finite-state morphological transducers for three Kypchak languages. Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC, Reykjavik, Iceland ,3378-3385.
  • Yiner, Z., Kurt, A., Kulamshaev, K., & Zafer, H. R. (2016, May). Kyrgyz orthography and morphotactics with implementation in NUVE. Proceedings of International Conference on Engineering and Natural Sciences, Sarajevo, Bosnia and Herzegovina, 1650-1658.
  • Zafer, H. R., Tilki, B., Kurt, A., & Kara, M. (2011, May). Two-level description of Kazakh morphology. Proceedings of the 1st International Conference on Foreign Language Teaching and Applied Linguistics (FLTAL’11), Sarajevo, Bosnia and Herzegovina, 560-564.
  • Zafer, H. R., “Nuve: A natural language processing library for Turkish in C#”. [Online]. Available: https://github.com/hrzafer/nuve. (05.12.2020).
There are 29 citations in total.

Details

Primary Language English
Subjects Computer Software
Journal Section Research Article
Authors

Züleyha Yiner 0000-0001-7017-6114

Atakan Kurt This is me 0000-0002-9549-8475

Publication Date June 29, 2021
Submission Date December 19, 2020
Published in Issue Year 2021 Volume: 5 Issue: 1

Cite

APA Yiner, Z., & Kurt, A. (2021). Two Level Kazakh Morphology. Acta Infologica, 5(1), 79-98.
AMA Yiner Z, Kurt A. Two Level Kazakh Morphology. ACIN. June 2021;5(1):79-98.
Chicago Yiner, Züleyha, and Atakan Kurt. “Two Level Kazakh Morphology”. Acta Infologica 5, no. 1 (June 2021): 79-98.
EndNote Yiner Z, Kurt A (June 1, 2021) Two Level Kazakh Morphology. Acta Infologica 5 1 79–98.
IEEE Z. Yiner and A. Kurt, “Two Level Kazakh Morphology”, ACIN, vol. 5, no. 1, pp. 79–98, 2021.
ISNAD Yiner, Züleyha - Kurt, Atakan. “Two Level Kazakh Morphology”. Acta Infologica 5/1 (June 2021), 79-98.
JAMA Yiner Z, Kurt A. Two Level Kazakh Morphology. ACIN. 2021;5:79–98.
MLA Yiner, Züleyha and Atakan Kurt. “Two Level Kazakh Morphology”. Acta Infologica, vol. 5, no. 1, 2021, pp. 79-98.
Vancouver Yiner Z, Kurt A. Two Level Kazakh Morphology. ACIN. 2021;5(1):79-98.