Author Identification for Turkish Texts
Öz
The main concern of author identification is to define an appropriate
characterization of documents that captures the writing style of authors. The most
important approaches to computer-based author identification are exclusively based
on lexical measures. In this paper we presented a fully automated approach to the
identification of the authorship of unrestricted text by adapting a set of style markers
to the analysis of the text. In this study, 35 style markers were applied to each
author. By using our method, the author of a text can be identified by using the style
markers that characterize a group of authors. The author group consists of 20
different writers. Author features including style markers were derived together with
different machine learning algorithms. By using our method we have obtained a
success rate of 80% in avarege.
Anahtar Kelimeler
Kaynakça
- A. Genkin, D. D. Lewis, and D. Madigan, Large-scale bayesian logistic regression for text categorization, 2004.
- B.Diri, M. F. Amasyal›, Automatic Author Detection for Turkish Text, ICANN/ICONIP’03 13th International Conference on Artificial Neural Network and 10th International Conference on Neural Information Processing, 2003.
- B.Kessler, G. Nunberg, H.Schutze, Automatic Detection of Text Genre, Proc. of 35th Annual Meeting of the Association for Computational Linguistics (ACL/EACL’97), 32-38 1997.
- Chris Callison-Burch, Co-training for Statistical Machine Translation, Master’s thesis, University of Edinburgh, 2002.
- Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999.
- D. Biber, Variations Across Speech and Writing, Cambridge University Press, 1988.
- D. I. Holmes, Stylometry: Its Origins, Development and Aspirations, presented to the Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, Queen’s University, Kingston, Ontario, 1997.
- D. Khmelev, Disputed authorship resolution using relative entropy for markov chain of letters in a text, In R. Baayen, editor, 4th Conference Int. Quantitative Linguistics Association, Prague, 2000.
Ayrıntılar
Birincil Dil
Türkçe
Konular
-
Bölüm
Araştırma Makalesi
Yayımlanma Tarihi
1 Ağustos 2007
Gönderilme Tarihi
1 Şubat 2014
Kabul Tarihi
-
Yayımlandığı Sayı
Yıl 2007 Cilt: 1 Sayı: 7