BibTex RIS Cite

TOWARDS A DATA-DRIVEN MORPHOLOGICAL ANALYSIS OF KAZAKH LANGUAGE

Year 2014, Volume: 7 Issue: 2, 31 - 36, 21.12.2014

Abstract

We propose a method for complete morphological analysis of Kazakh language that accounts for both inflectional and derivational morphology. Our method is data-driven and does not require manually generated rules, which makes it convenient for analyzing agglutinative languages. The intuition behind our approach is to label morphemes with so called transition labels, i.e. labels that encode grammatical functions of morphemes as transitions between corresponding POS, and use transitivity to ease the analysis. We evaluate our method on a fair-sized sample of real data and report encouraging results.

References

  • [1] D. Elworthy, “Tagset design and inflected languages,” in In EACL SIGDAT workshop iFrom Texts to Tags: Issues in Multilingual Language Analysis, 1995, pp. 1–10.
  • [2] J. Hana and A. Feldman, “A positional tag set for Russian,” Proceedings of LREC-10. Malta, 2010.
  • [3] K. Koskenniemi, “A general computational model for word-form recognition and production,” in Proceedings of the 10th international conference on Computational linguistics. ACL, 1984, pp. 178–181. [4] K. Oflazer and C. Güzey, “Spelling correction in agglutinative languages.” in ANLP, 1994, pp. 194–195.
  • [5] H. Sak, T. Güngor, and M. Saraçlar, “A stochastic finite-state morphological parser for Turkish,” in Proceedings of the ACL-IJCNLP 2009 Conference. Stroudsburg, PA, USA: ACL, 2009, pp. 273–276.
  • [6] M. Hulden, “Foma: a finite-state compiler and library.” in EACL (Demos), A. Lascarides, C. Gardent, and J. Nivre, Eds. ACL, 2009, pp. 29–32.
  • [7] K. Linden, M. Silfverberg, E. Axelson, S. Hardwick, and T. Pirinen, HFST-Framework for Compiling and Applying Morphologies, ser. Communications in Computer and Information Science, 2011, vol. Vol. 100, pp. 67–85.
  • [8] D. Z. Hakkani-Tur, K. Oflazer, and G. Tur, “Statistical morphological disambiguation for agglutinative languages.” Computers and the Humanities, vol. 36, no. 4, pp. 381–410, 2002.
  • [9] J. Hajič, P. Krbec, P. Pavel Květoň, K. Oliva, and V. Petkevič, “Serial combination of rules and statistics: A case study in czech tagging,” in Proceedings of the 39th Annual Meeting on ACL. Stroudsburg, PA, USA: ACL, 2001, pp. 268–275.
  • [10] G. D. Grzegorz Chrupała and J. van Genabith, “Learning morphology with morfette,” in Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco: ELRA, may 2008.
  • [11] M. Creutz and K. Lagus, “Unsupervised models for morpheme segmentation and morphology learning,” ACM Transactions on Speech and Language Processing (TSLP), vol. 4, no. 1, p. 3, 2007.
  • [12] O. Kohonen, S. Virpioja, L. Lepp¨anen, and K. Lagus, “Semi-supervised extensions to morfessor baseline,” in Proceedings of the Morpho Challenge 2010 Workshop. Espoo, Finland: Aalto University, September 2010.
  • [13] A. Sharipbayev, G. Bekmanova, B. Ergesh, A. Buribayeva, and M. K. Karabalayeva, “Intellectual morphological analyzer based on semantic networks,” in Proceedings of the OSTIS-2012, 2012, pp. 397–400.
  • [14] D. E. Shuklin, “The structure of a semantic neural network extracting the meaning from a text,” Cybernetics and Sys. Anal., vol. 37, no.
  • 2, pp. 182–186, Mar. 2001. [15] G. Kessikbayeva and I. Cicekli, “Rule based morphological analyzer of Kazakh language,” in Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM. Baltimore, Maryland: ACL, June 2014, pp. 46–54.
  • [16] H. R. Zafer, B. Tilki, A. Kurt, and M. Kara, “Two-level description of Kazakh morphology,” in Proceedings of the 1st International Conference on Foreign Language Teaching and Applied Linguistics (FLTAL11), Sarajevo, May 2011.
  • [17] G. Altenbek and W. Xiao-long, “Kazakh segmentation system of inflectional affixes,” in CIPS-SIGHAN, 2010, pp. 183–190.
  • [18] B. M. Kairakbay and D. L. Zaurbekov, “Finite state approach to the Kazakh nominal paradigm,” in Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing. St Andrews, Scotland: ACL, July 2013, pp. 108– 112.
  • [19] A. Ranta, “A multilingual natural-language interface to regular expressions,” in Proceedings of the International Workshop on Finite State Methods in Natural Language Processing, ser. FSMNLP ’09. Stroudsburg, PA, USA: ACL, 1998, pp. 79–90.
  • [20] A. Makazhanov, O. Makhambetov, I. Sabyrgaliyev, and Z. Yessenbayev, “Spelling correction for kazakh,” in Proceedings of the 2014 CICLing. Kathmandu, Nepal: Springer Berlin Heidelberg, 2014, pp. 533–541.
  • [21] O. Makhambetov, A. Makazhanov, Z. Yessenbayev, B. Matkarimov, I. Sabyrgaliyev, and A. Sharafudinov, “Assembling the kazakh language corpus,” in EMNLP. Seattle, Washington, USA: ACL, October 2013, pp. 1022–1031.
Year 2014, Volume: 7 Issue: 2, 31 - 36, 21.12.2014

Abstract

References

  • [1] D. Elworthy, “Tagset design and inflected languages,” in In EACL SIGDAT workshop iFrom Texts to Tags: Issues in Multilingual Language Analysis, 1995, pp. 1–10.
  • [2] J. Hana and A. Feldman, “A positional tag set for Russian,” Proceedings of LREC-10. Malta, 2010.
  • [3] K. Koskenniemi, “A general computational model for word-form recognition and production,” in Proceedings of the 10th international conference on Computational linguistics. ACL, 1984, pp. 178–181. [4] K. Oflazer and C. Güzey, “Spelling correction in agglutinative languages.” in ANLP, 1994, pp. 194–195.
  • [5] H. Sak, T. Güngor, and M. Saraçlar, “A stochastic finite-state morphological parser for Turkish,” in Proceedings of the ACL-IJCNLP 2009 Conference. Stroudsburg, PA, USA: ACL, 2009, pp. 273–276.
  • [6] M. Hulden, “Foma: a finite-state compiler and library.” in EACL (Demos), A. Lascarides, C. Gardent, and J. Nivre, Eds. ACL, 2009, pp. 29–32.
  • [7] K. Linden, M. Silfverberg, E. Axelson, S. Hardwick, and T. Pirinen, HFST-Framework for Compiling and Applying Morphologies, ser. Communications in Computer and Information Science, 2011, vol. Vol. 100, pp. 67–85.
  • [8] D. Z. Hakkani-Tur, K. Oflazer, and G. Tur, “Statistical morphological disambiguation for agglutinative languages.” Computers and the Humanities, vol. 36, no. 4, pp. 381–410, 2002.
  • [9] J. Hajič, P. Krbec, P. Pavel Květoň, K. Oliva, and V. Petkevič, “Serial combination of rules and statistics: A case study in czech tagging,” in Proceedings of the 39th Annual Meeting on ACL. Stroudsburg, PA, USA: ACL, 2001, pp. 268–275.
  • [10] G. D. Grzegorz Chrupała and J. van Genabith, “Learning morphology with morfette,” in Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco: ELRA, may 2008.
  • [11] M. Creutz and K. Lagus, “Unsupervised models for morpheme segmentation and morphology learning,” ACM Transactions on Speech and Language Processing (TSLP), vol. 4, no. 1, p. 3, 2007.
  • [12] O. Kohonen, S. Virpioja, L. Lepp¨anen, and K. Lagus, “Semi-supervised extensions to morfessor baseline,” in Proceedings of the Morpho Challenge 2010 Workshop. Espoo, Finland: Aalto University, September 2010.
  • [13] A. Sharipbayev, G. Bekmanova, B. Ergesh, A. Buribayeva, and M. K. Karabalayeva, “Intellectual morphological analyzer based on semantic networks,” in Proceedings of the OSTIS-2012, 2012, pp. 397–400.
  • [14] D. E. Shuklin, “The structure of a semantic neural network extracting the meaning from a text,” Cybernetics and Sys. Anal., vol. 37, no.
  • 2, pp. 182–186, Mar. 2001. [15] G. Kessikbayeva and I. Cicekli, “Rule based morphological analyzer of Kazakh language,” in Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM. Baltimore, Maryland: ACL, June 2014, pp. 46–54.
  • [16] H. R. Zafer, B. Tilki, A. Kurt, and M. Kara, “Two-level description of Kazakh morphology,” in Proceedings of the 1st International Conference on Foreign Language Teaching and Applied Linguistics (FLTAL11), Sarajevo, May 2011.
  • [17] G. Altenbek and W. Xiao-long, “Kazakh segmentation system of inflectional affixes,” in CIPS-SIGHAN, 2010, pp. 183–190.
  • [18] B. M. Kairakbay and D. L. Zaurbekov, “Finite state approach to the Kazakh nominal paradigm,” in Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing. St Andrews, Scotland: ACL, July 2013, pp. 108– 112.
  • [19] A. Ranta, “A multilingual natural-language interface to regular expressions,” in Proceedings of the International Workshop on Finite State Methods in Natural Language Processing, ser. FSMNLP ’09. Stroudsburg, PA, USA: ACL, 1998, pp. 79–90.
  • [20] A. Makazhanov, O. Makhambetov, I. Sabyrgaliyev, and Z. Yessenbayev, “Spelling correction for kazakh,” in Proceedings of the 2014 CICLing. Kathmandu, Nepal: Springer Berlin Heidelberg, 2014, pp. 533–541.
  • [21] O. Makhambetov, A. Makazhanov, Z. Yessenbayev, B. Matkarimov, I. Sabyrgaliyev, and A. Sharafudinov, “Assembling the kazakh language corpus,” in EMNLP. Seattle, Washington, USA: ACL, October 2013, pp. 1022–1031.
There are 20 citations in total.

Details

Other ID JA37MR93HB
Journal Section Makaleler(Araştırma)
Authors

Olzhas Makhambetov This is me

Aibek Makazhanov This is me

Zhandos Yessenbayev This is me

Islam Sabyrgaliyev This is me

Anuar Sharafudinov This is me

Publication Date December 21, 2014
Published in Issue Year 2014 Volume: 7 Issue: 2

Cite

APA Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Sabyrgaliyev, I., et al. (2014). TOWARDS A DATA-DRIVEN MORPHOLOGICAL ANALYSIS OF KAZAKH LANGUAGE. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi, 7(2), 31-36.
AMA Makhambetov O, Makazhanov A, Yessenbayev Z, Sabyrgaliyev I, Sharafudinov A. TOWARDS A DATA-DRIVEN MORPHOLOGICAL ANALYSIS OF KAZAKH LANGUAGE. TBV-BBMD. December 2014;7(2):31-36.
Chicago Makhambetov, Olzhas, Aibek Makazhanov, Zhandos Yessenbayev, Islam Sabyrgaliyev, and Anuar Sharafudinov. “TOWARDS A DATA-DRIVEN MORPHOLOGICAL ANALYSIS OF KAZAKH LANGUAGE”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi 7, no. 2 (December 2014): 31-36.
EndNote Makhambetov O, Makazhanov A, Yessenbayev Z, Sabyrgaliyev I, Sharafudinov A (December 1, 2014) TOWARDS A DATA-DRIVEN MORPHOLOGICAL ANALYSIS OF KAZAKH LANGUAGE. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 7 2 31–36.
IEEE O. Makhambetov, A. Makazhanov, Z. Yessenbayev, I. Sabyrgaliyev, and A. Sharafudinov, “TOWARDS A DATA-DRIVEN MORPHOLOGICAL ANALYSIS OF KAZAKH LANGUAGE”, TBV-BBMD, vol. 7, no. 2, pp. 31–36, 2014.
ISNAD Makhambetov, Olzhas et al. “TOWARDS A DATA-DRIVEN MORPHOLOGICAL ANALYSIS OF KAZAKH LANGUAGE”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 7/2 (December 2014), 31-36.
JAMA Makhambetov O, Makazhanov A, Yessenbayev Z, Sabyrgaliyev I, Sharafudinov A. TOWARDS A DATA-DRIVEN MORPHOLOGICAL ANALYSIS OF KAZAKH LANGUAGE. TBV-BBMD. 2014;7:31–36.
MLA Makhambetov, Olzhas et al. “TOWARDS A DATA-DRIVEN MORPHOLOGICAL ANALYSIS OF KAZAKH LANGUAGE”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi, vol. 7, no. 2, 2014, pp. 31-36.
Vancouver Makhambetov O, Makazhanov A, Yessenbayev Z, Sabyrgaliyev I, Sharafudinov A. TOWARDS A DATA-DRIVEN MORPHOLOGICAL ANALYSIS OF KAZAKH LANGUAGE. TBV-BBMD. 2014;7(2):31-6.

Article Acceptance

Use user registration/login to upload articles online.

The acceptance process of the articles sent to the journal consists of the following stages:

1. Each submitted article is sent to at least two referees at the first stage.

2. Referee appointments are made by the journal editors. There are approximately 200 referees in the referee pool of the journal and these referees are classified according to their areas of interest. Each referee is sent an article on the subject he is interested in. The selection of the arbitrator is done in a way that does not cause any conflict of interest.

3. In the articles sent to the referees, the names of the authors are closed.

4. Referees are explained how to evaluate an article and are asked to fill in the evaluation form shown below.

5. The articles in which two referees give positive opinion are subjected to similarity review by the editors. The similarity in the articles is expected to be less than 25%.

6. A paper that has passed all stages is reviewed by the editor in terms of language and presentation, and necessary corrections and improvements are made. If necessary, the authors are notified of the situation.

0

.   This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.