EN
Author Identification with Machine Learning Algorithms
Abstract
Author identification is one of the application areas of text mining. It deals with the automatic prediction of the potential author of an electronic text among predefined author candidates by using author specific writing styles. In this study, we conducted an experiment for the identification of the author of a Turkish language text by using classical machine learning methods including Support Vector Machines (SVM), Gaussian Naive Bayes (GaussianNB), Multi Layer Perceptron (MLP), Logistic Regression (LR), Stochastic Gradient Descent (SGD) and ensemble learning methods including Extremely Randomized Trees (ExtraTrees), and eXtreme Gradient Boosting (XGBoost). The proposed method was applied on three different sizes of author groups including 10, 15 and 20 authors obtained from a new dataset of newspaper articles. Term frequency-inverse document frequency (TF-IDF) vectors were created by using 1-gram and 2-gram word tokens. Our results show that the most successful method is the SGD with a classification performance accuracy of 0.976% by using word unigrams and most successful method is the LR with a classification performance accuracy of 0.935% by using word bigrams.
Keywords
Project Number
378
References
- Stamatatos, Efstathios. “A survey of modern authorship attribution methods.” Journal of the American Society for information Science and Technology 60.3 (2009): 538-556.
- Alhuqail, Noura Khalid, Author Identification Based on NLP (April 6, 2021). European Journal of Computer Science and Information Technology, Vol.9, No.1, pp.1-26, 2021, Available at SSRN: https://ssrn.com/abstract=3820262
- Maël Fabien, Esau Villatoro-Tello, Petr Motlicek, and Shantipriya Parida. 2020. “BertAA : BERT fine-tuning for Authorship Attribution.” In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 127–137, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
- A. M. Mohsen, N. M. El-Makky and N. Ghanem, "Author Identification Using Deep Learning," 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016, pp. 898-903, doi: 10.1109/ICMLA.2016.0161.
- Yunita Sari, Mark Stevenson, and Andreas Vlachos. 2018. Topic or Style? Exploring the Most Useful Features for Authorship Attribution. In Proceedings of the 27th International Conference on Computational Linguistics, pages 343–353, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Barlas, G., Stamatatos, E. (2020). Cross-Domain Authorship Attribution Using Pre-trained Language Models. In: Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-030-49161-1_22
- Ramezani, Reza. “A language-independent authorship attribution approach for author identification of text documents.” Expert Systems with Applications 180 (2021): 115139.
- Olga Fourkioti, Symeon Symeonidis, Avi Arampatzis, Language models and fusion for authorship attribution, Information Processing & Management, Volume 56, Issue 6, 2019, 102061, ISSN 0306-4573, https://doi.org/10.1016/j.ipm.2019.102061.
Details
Primary Language
English
Subjects
Engineering
Journal Section
Research Article
Publication Date
July 20, 2022
Submission Date
June 13, 2022
Acceptance Date
June 20, 2022
Published in Issue
Year 2022 Volume: 6 Number: 1
APA
Yülüce, İ., & Dalkılıç, F. (2022). Author Identification with Machine Learning Algorithms. International Journal of Multidisciplinary Studies and Innovative Technologies, 6(1), 45-50. https://izlik.org/JA83DX69DL
AMA
1.Yülüce İ, Dalkılıç F. Author Identification with Machine Learning Algorithms. IJMSIT. 2022;6(1):45-50. https://izlik.org/JA83DX69DL
Chicago
Yülüce, İbrahim, and Feriştah Dalkılıç. 2022. “Author Identification With Machine Learning Algorithms”. International Journal of Multidisciplinary Studies and Innovative Technologies 6 (1): 45-50. https://izlik.org/JA83DX69DL.
EndNote
Yülüce İ, Dalkılıç F (July 1, 2022) Author Identification with Machine Learning Algorithms. International Journal of Multidisciplinary Studies and Innovative Technologies 6 1 45–50.
IEEE
[1]İ. Yülüce and F. Dalkılıç, “Author Identification with Machine Learning Algorithms”, IJMSIT, vol. 6, no. 1, pp. 45–50, July 2022, [Online]. Available: https://izlik.org/JA83DX69DL
ISNAD
Yülüce, İbrahim - Dalkılıç, Feriştah. “Author Identification With Machine Learning Algorithms”. International Journal of Multidisciplinary Studies and Innovative Technologies 6/1 (July 1, 2022): 45-50. https://izlik.org/JA83DX69DL.
JAMA
1.Yülüce İ, Dalkılıç F. Author Identification with Machine Learning Algorithms. IJMSIT. 2022;6:45–50.
MLA
Yülüce, İbrahim, and Feriştah Dalkılıç. “Author Identification With Machine Learning Algorithms”. International Journal of Multidisciplinary Studies and Innovative Technologies, vol. 6, no. 1, July 2022, pp. 45-50, https://izlik.org/JA83DX69DL.
Vancouver
1.İbrahim Yülüce, Feriştah Dalkılıç. Author Identification with Machine Learning Algorithms. IJMSIT [Internet]. 2022 Jul. 1;6(1):45-50. Available from: https://izlik.org/JA83DX69DL