EN
Deep Feature Generation for Author Identification
Abstract
Identifying the authors of a given set of text is a well addressed and complicated task. It requires thorough knowledge of different authors’ writing styles and discriminating them. As the main contribution of this paper, we propose to perform this task using machine learning and deep learning methods, state-of-the-art algorithms, and methods used in numerous complex Natural Language Processing (NLP) problems. We used a text corpus of daily newspaper columns written by thirty authors to perform our experiments. The experimental results proved that document embeddings trained via neural network architecture achieve cutting edge accuracy in learning writing styles and identifying authors of given writings even though the dataset has a considerably unbalanced distribution. We represent our experimental results and outsource our codes for interested readers and natural language processing (NLP) enthusiasts as a GitHub repository. They can reproduce and confirm the results and modify them according to their own needs.
Keywords
Supporting Institution
TÜBİTAK
Project Number
3190585
Thanks
This work is a part of the project supported by the Scientific and Technological Research Council of Turkey (TUBITAK) TEYDEB-1501 program under Project no 3190585, and named “General Purpose Chatbot Application That Can Produce Meaningful Dialog via Machine Learning Algorithms”.
References
- Stamatatos, E., Fakotakis, N., Kokkinakis, G.: 2000. Automatic text categorization in terms of genre and author. Comput. Linguist. 26(4), 471–495
- Sebastiani, F. 2002. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1): 1-47.
- Zheng, Rong, et al. 2006. “A framework for authorship identification of online messages: Writing‐style features and classification techniques.” Journal of the American society for information science and technology 57.3 : 378-393.
- Burrows, J.F. 1987. Word Patterns and Story Shapes: The Statistical Analysis of Narrative Style. Literary and Linguistic Computing 2: 61-70.
- Diederich, J., J. Kindermann, E. Leopold, and G. Paass. 2003.. Authorship Attribution with Support Vector Machines. Applied Intelligence 19(1/2): 109-123
- Luyckx, K., Daelemans 2011, W.: The effect of author set size and data size in authorship attribution. Literary Linguist. Comput. 26(1), 35–55
- Abbasi, Ahmed, and Hsinchun Chen. 2008. “Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace.” ACM Transactions on Information Systems (TOIS) 26.2 : 1-29.
- Holmes, D. 1998. The Evolution of Stylometry in Humanities Scholarship. Literary and Linguistic Computing, 13(3): 111-117.
Details
Primary Language
English
Subjects
Engineering
Journal Section
Research Article
Publication Date
June 28, 2021
Submission Date
December 24, 2020
Acceptance Date
May 3, 2021
Published in Issue
Year 2021 Volume: 17 Number: 2
APA
Ozan, Ş., Taşar, D. E., & Özdil, U. (2021). Deep Feature Generation for Author Identification. Celal Bayar University Journal of Science, 17(2), 137-143. https://doi.org/10.18466/cbayarfbe.846016
AMA
1.Ozan Ş, Taşar DE, Özdil U. Deep Feature Generation for Author Identification. CBUJOS. 2021;17(2):137-143. doi:10.18466/cbayarfbe.846016
Chicago
Ozan, Şükrü, Davut Emre Taşar, and Umut Özdil. 2021. “Deep Feature Generation for Author Identification”. Celal Bayar University Journal of Science 17 (2): 137-43. https://doi.org/10.18466/cbayarfbe.846016.
EndNote
Ozan Ş, Taşar DE, Özdil U (June 1, 2021) Deep Feature Generation for Author Identification. Celal Bayar University Journal of Science 17 2 137–143.
IEEE
[1]Ş. Ozan, D. E. Taşar, and U. Özdil, “Deep Feature Generation for Author Identification”, CBUJOS, vol. 17, no. 2, pp. 137–143, June 2021, doi: 10.18466/cbayarfbe.846016.
ISNAD
Ozan, Şükrü - Taşar, Davut Emre - Özdil, Umut. “Deep Feature Generation for Author Identification”. Celal Bayar University Journal of Science 17/2 (June 1, 2021): 137-143. https://doi.org/10.18466/cbayarfbe.846016.
JAMA
1.Ozan Ş, Taşar DE, Özdil U. Deep Feature Generation for Author Identification. CBUJOS. 2021;17:137–143.
MLA
Ozan, Şükrü, et al. “Deep Feature Generation for Author Identification”. Celal Bayar University Journal of Science, vol. 17, no. 2, June 2021, pp. 137-43, doi:10.18466/cbayarfbe.846016.
Vancouver
1.Şükrü Ozan, Davut Emre Taşar, Umut Özdil. Deep Feature Generation for Author Identification. CBUJOS. 2021 Jun. 1;17(2):137-43. doi:10.18466/cbayarfbe.846016