EN
Author Identification with Machine Learning Algorithms
Öz
Author identification is one of the application areas of text mining. It deals with the automatic prediction of the potential author of an electronic text among predefined author candidates by using author specific writing styles. In this study, we conducted an experiment for the identification of the author of a Turkish language text by using classical machine learning methods including Support Vector Machines (SVM), Gaussian Naive Bayes (GaussianNB), Multi Layer Perceptron (MLP), Logistic Regression (LR), Stochastic Gradient Descent (SGD) and ensemble learning methods including Extremely Randomized Trees (ExtraTrees), and eXtreme Gradient Boosting (XGBoost). The proposed method was applied on three different sizes of author groups including 10, 15 and 20 authors obtained from a new dataset of newspaper articles. Term frequency-inverse document frequency (TF-IDF) vectors were created by using 1-gram and 2-gram word tokens. Our results show that the most successful method is the SGD with a classification performance accuracy of 0.976% by using word unigrams and most successful method is the LR with a classification performance accuracy of 0.935% by using word bigrams.
Anahtar Kelimeler
Proje Numarası
378
Kaynakça
- Stamatatos, Efstathios. “A survey of modern authorship attribution methods.” Journal of the American Society for information Science and Technology 60.3 (2009): 538-556.
- Alhuqail, Noura Khalid, Author Identification Based on NLP (April 6, 2021). European Journal of Computer Science and Information Technology, Vol.9, No.1, pp.1-26, 2021, Available at SSRN: https://ssrn.com/abstract=3820262
- Maël Fabien, Esau Villatoro-Tello, Petr Motlicek, and Shantipriya Parida. 2020. “BertAA : BERT fine-tuning for Authorship Attribution.” In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 127–137, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
- A. M. Mohsen, N. M. El-Makky and N. Ghanem, "Author Identification Using Deep Learning," 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016, pp. 898-903, doi: 10.1109/ICMLA.2016.0161.
- Yunita Sari, Mark Stevenson, and Andreas Vlachos. 2018. Topic or Style? Exploring the Most Useful Features for Authorship Attribution. In Proceedings of the 27th International Conference on Computational Linguistics, pages 343–353, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Barlas, G., Stamatatos, E. (2020). Cross-Domain Authorship Attribution Using Pre-trained Language Models. In: Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-030-49161-1_22
- Ramezani, Reza. “A language-independent authorship attribution approach for author identification of text documents.” Expert Systems with Applications 180 (2021): 115139.
- Olga Fourkioti, Symeon Symeonidis, Avi Arampatzis, Language models and fusion for authorship attribution, Information Processing & Management, Volume 56, Issue 6, 2019, 102061, ISSN 0306-4573, https://doi.org/10.1016/j.ipm.2019.102061.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Mühendislik
Bölüm
Araştırma Makalesi
Yayımlanma Tarihi
20 Temmuz 2022
Gönderilme Tarihi
13 Haziran 2022
Kabul Tarihi
20 Haziran 2022
Yayımlandığı Sayı
Yıl 2022 Cilt: 6 Sayı: 1
APA
Yülüce, İ., & Dalkılıç, F. (2022). Author Identification with Machine Learning Algorithms. International Journal of Multidisciplinary Studies and Innovative Technologies, 6(1), 45-50. https://izlik.org/JA83DX69DL
AMA
1.Yülüce İ, Dalkılıç F. Author Identification with Machine Learning Algorithms. IJMSIT. 2022;6(1):45-50. https://izlik.org/JA83DX69DL
Chicago
Yülüce, İbrahim, ve Feriştah Dalkılıç. 2022. “Author Identification with Machine Learning Algorithms”. International Journal of Multidisciplinary Studies and Innovative Technologies 6 (1): 45-50. https://izlik.org/JA83DX69DL.
EndNote
Yülüce İ, Dalkılıç F (01 Temmuz 2022) Author Identification with Machine Learning Algorithms. International Journal of Multidisciplinary Studies and Innovative Technologies 6 1 45–50.
IEEE
[1]İ. Yülüce ve F. Dalkılıç, “Author Identification with Machine Learning Algorithms”, IJMSIT, c. 6, sy 1, ss. 45–50, Tem. 2022, [çevrimiçi]. Erişim adresi: https://izlik.org/JA83DX69DL
ISNAD
Yülüce, İbrahim - Dalkılıç, Feriştah. “Author Identification with Machine Learning Algorithms”. International Journal of Multidisciplinary Studies and Innovative Technologies 6/1 (01 Temmuz 2022): 45-50. https://izlik.org/JA83DX69DL.
JAMA
1.Yülüce İ, Dalkılıç F. Author Identification with Machine Learning Algorithms. IJMSIT. 2022;6:45–50.
MLA
Yülüce, İbrahim, ve Feriştah Dalkılıç. “Author Identification with Machine Learning Algorithms”. International Journal of Multidisciplinary Studies and Innovative Technologies, c. 6, sy 1, Temmuz 2022, ss. 45-50, https://izlik.org/JA83DX69DL.
Vancouver
1.İbrahim Yülüce, Feriştah Dalkılıç. Author Identification with Machine Learning Algorithms. IJMSIT [Internet]. 01 Temmuz 2022;6(1):45-50. Erişim adresi: https://izlik.org/JA83DX69DL