Review Article

Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre

Volume: 14 Number: 1 January 21, 2026
TR EN

Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre

Abstract

Authorship identification seeks to determine the writer of a text by analyzing distinctive linguistic and stylistic features. These characteristics may vary across dimensions such as region, age, and genre. Identifying an author’s stylistic fingerprint is essential in plagiarism detection, digital forensics, and computational linguistics. In this study, the authorship features of Turkish columnists were analyzed using Artificial Neural Networks (ANN), Support Vector Machines (SVM), and decision tree algorithms (J48 and Random Forest). Sixteen stylometric indicators were selected through the Zemberek natural language processing library and evaluated across six distinct datasets. The proposed system allows flexible parameter adjustment through a graphical interface and exports results in ARFF format for reproducibility. Experimental results demonstrated that Random Forest achieved the highest overall accuracy, particularly in regional and age-based datasets, with F-measures reaching up to 0.91. The accuracy rates were 73% for regional classification, 55% for genre classification, and 62.5% for age-based classification. The findings confirm that combining statistical learning with stylometric analysis provides a robust framework for Turkish authorship attribution, paving the way for future studies employing deep learning and transformer-based models.

Keywords

Supporting Institution

This research received no external funding.

Ethical Statement

This study does not involve human or animal participants. All procedures followed scientific and ethical principles, and all referenced studies are appropriately cited.

Thanks

The authors do not wish to acknowledge any individual or institution.

References

  1. Akın, A. A., & Akın, M. D. (2007). Zemberek: An open source NLP framework for Turkic languages. Structure, 10, 1-5. https://www.academia.edu/download/34521696/zemberek_makale.pdf
  2. Amasyalı, M. F., & Yıldırım, T. (2004). Otomatik haber metinleri sınıflandırma. In Proceedings of the IEEE 12th Signal Processing and Communications Applications Conference (SIU 2004) (pp. 224–226). IEEE.
  3. Amasyalı, M. F., & Diri, B. (2006). Automatic Turkish text categorization in terms of author, genre and gender. In International Conference on Application of Natural Language to Information Systems (pp. 221-226). Springer. https://doi.org/10.1007/11765448_22
  4. Amasyalı, M. F., Diri, B., & Türkoğlu, F. (2006). Farklı özellik vektörleri ile Türkçe dokümanların yazarlarının belirlenmesi. In 15th Turkish Symposium on Artificial Intelligence and Neural Networks (pp. 1–4). Muğla, Türkiye.
  5. Arora, R., & Suman, S. (2012). Comparative analysis of classification algorithms on different datasets using WEKA. International Journal of Computer Applications, 54(13), 21-25.
  6. Birant, D. (2011). Comparison of decision tree algorithms for predicting potential air pollutant emissions with data mining models. Journal of Environmental Informatics, 17(1), 46–53. https://doi.org/10.3808/jei.201100186
  7. Bhargava, N., Sharma, S., Purohit, R., & Rathore, P. S. (2017). Prediction of recurrence cancer using J48 algorithm. In Proceedings of the 2017 2nd International Conference on Communication and Electronics Systems (ICCES 2017) (pp. 386–390). IEEE. https://doi.org/10.1109/CESYS.2017.8321306
  8. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Details

Primary Language

English

Subjects

Classification Algorithms

Journal Section

Review Article

Publication Date

January 21, 2026

Submission Date

June 30, 2025

Acceptance Date

November 3, 2025

Published in Issue

Year 2026 Volume: 14 Number: 1

APA
Levent, V. E., & Özbalkan, U. (2026). Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre. Duzce University Journal of Science and Technology, 14(1), 288-298. https://doi.org/10.29130/dubited.1728460
AMA
1.Levent VE, Özbalkan U. Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre. DUBİTED. 2026;14(1):288-298. doi:10.29130/dubited.1728460
Chicago
Levent, Vecdi Emre, and Uğur Özbalkan. 2026. “Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre”. Duzce University Journal of Science and Technology 14 (1): 288-98. https://doi.org/10.29130/dubited.1728460.
EndNote
Levent VE, Özbalkan U (January 1, 2026) Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre. Duzce University Journal of Science and Technology 14 1 288–298.
IEEE
[1]V. E. Levent and U. Özbalkan, “Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre”, DUBİTED, vol. 14, no. 1, pp. 288–298, Jan. 2026, doi: 10.29130/dubited.1728460.
ISNAD
Levent, Vecdi Emre - Özbalkan, Uğur. “Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre”. Duzce University Journal of Science and Technology 14/1 (January 1, 2026): 288-298. https://doi.org/10.29130/dubited.1728460.
JAMA
1.Levent VE, Özbalkan U. Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre. DUBİTED. 2026;14:288–298.
MLA
Levent, Vecdi Emre, and Uğur Özbalkan. “Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre”. Duzce University Journal of Science and Technology, vol. 14, no. 1, Jan. 2026, pp. 288-9, doi:10.29130/dubited.1728460.
Vancouver
1.Vecdi Emre Levent, Uğur Özbalkan. Stylometric Profiling of Turkish Texts: Joint Estimation of Author, Region, Age and Genre. DUBİTED. 2026 Jan. 1;14(1):288-9. doi:10.29130/dubited.1728460