TR
EN
Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre
Abstract
Authorship identification seeks to determine the writer of a text by analyzing distinctive linguistic and stylistic features. These characteristics may vary across dimensions such as region, age, and genre. Identifying an author’s stylistic fingerprint is essential in plagiarism detection, digital forensics, and computational linguistics. In this study, the authorship features of Turkish columnists were analyzed using Artificial Neural Networks (ANN), Support Vector Machines (SVM), and decision tree algorithms (J48 and Random Forest). Sixteen stylometric indicators were selected through the Zemberek natural language processing library and evaluated across six distinct datasets. The proposed system allows flexible parameter adjustment through a graphical interface and exports results in ARFF format for reproducibility. Experimental results demonstrated that Random Forest achieved the highest overall accuracy, particularly in regional and age-based datasets, with F-measures reaching up to 0.91. The accuracy rates were 73% for regional classification, 55% for genre classification, and 62.5% for age-based classification. The findings confirm that combining statistical learning with stylometric analysis provides a robust framework for Turkish authorship attribution, paving the way for future studies employing deep learning and transformer-based models.
Keywords
Supporting Institution
This research received no external funding.
Ethical Statement
This study does not involve human or animal participants. All procedures followed scientific and ethical principles, and all referenced studies are appropriately cited.
Thanks
The authors do not wish to acknowledge any individual or institution.
References
- Akın, A. A., & Akın, M. D. (2007). Zemberek: An open source NLP framework for Turkic languages. Structure, 10, 1-5. https://www.academia.edu/download/34521696/zemberek_makale.pdf
- Amasyalı, M. F., & Yıldırım, T. (2004). Otomatik haber metinleri sınıflandırma. In Proceedings of the IEEE 12th Signal Processing and Communications Applications Conference (SIU 2004) (pp. 224–226). IEEE.
- Amasyalı, M. F., & Diri, B. (2006). Automatic Turkish text categorization in terms of author, genre and gender. In International Conference on Application of Natural Language to Information Systems (pp. 221-226). Springer. https://doi.org/10.1007/11765448_22
- Amasyalı, M. F., Diri, B., & Türkoğlu, F. (2006). Farklı özellik vektörleri ile Türkçe dokümanların yazarlarının belirlenmesi. In 15th Turkish Symposium on Artificial Intelligence and Neural Networks (pp. 1–4). Muğla, Türkiye.
- Arora, R., & Suman, S. (2012). Comparative analysis of classification algorithms on different datasets using WEKA. International Journal of Computer Applications, 54(13), 21-25.
- Birant, D. (2011). Comparison of decision tree algorithms for predicting potential air pollutant emissions with data mining models. Journal of Environmental Informatics, 17(1), 46–53. https://doi.org/10.3808/jei.201100186
- Bhargava, N., Sharma, S., Purohit, R., & Rathore, P. S. (2017). Prediction of recurrence cancer using J48 algorithm. In Proceedings of the 2017 2nd International Conference on Communication and Electronics Systems (ICCES 2017) (pp. 386–390). IEEE. https://doi.org/10.1109/CESYS.2017.8321306
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Details
Primary Language
English
Subjects
Classification Algorithms
Journal Section
Review Article
Publication Date
January 21, 2026
Submission Date
June 30, 2025
Acceptance Date
November 3, 2025
Published in Issue
Year 2026 Volume: 14 Number: 1
APA
Levent, V. E., & Özbalkan, U. (2026). Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre. Duzce University Journal of Science and Technology, 14(1), 288-298. https://doi.org/10.29130/dubited.1728460
AMA
1.Levent VE, Özbalkan U. Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre. DUBİTED. 2026;14(1):288-298. doi:10.29130/dubited.1728460
Chicago
Levent, Vecdi Emre, and Uğur Özbalkan. 2026. “Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre”. Duzce University Journal of Science and Technology 14 (1): 288-98. https://doi.org/10.29130/dubited.1728460.
EndNote
Levent VE, Özbalkan U (January 1, 2026) Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre. Duzce University Journal of Science and Technology 14 1 288–298.
IEEE
[1]V. E. Levent and U. Özbalkan, “Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre”, DUBİTED, vol. 14, no. 1, pp. 288–298, Jan. 2026, doi: 10.29130/dubited.1728460.
ISNAD
Levent, Vecdi Emre - Özbalkan, Uğur. “Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre”. Duzce University Journal of Science and Technology 14/1 (January 1, 2026): 288-298. https://doi.org/10.29130/dubited.1728460.
JAMA
1.Levent VE, Özbalkan U. Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre. DUBİTED. 2026;14:288–298.
MLA
Levent, Vecdi Emre, and Uğur Özbalkan. “Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre”. Duzce University Journal of Science and Technology, vol. 14, no. 1, Jan. 2026, pp. 288-9, doi:10.29130/dubited.1728460.
Vancouver
1.Vecdi Emre Levent, Uğur Özbalkan. Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre. DUBİTED. 2026 Jan. 1;14(1):288-9. doi:10.29130/dubited.1728460