Satire identification in Turkish news articles based on ensemble of classifiers

: Social media and microblogging platforms generally contain elements of figurative and nonliteral language, including satire. The identification of figurative language is a fundamental task for sentiment analysis. It will not be possible to obtain sentiment analysis methods with high classification accuracy if elements of figurative language have not been properly identified. Satirical text is a kind of figurative language, in which irony and humor have been utilized to ridicule or criticize an event or entity. Satirical news is a pervasive issue on social media platforms, which can be deceptive and harmful. This paper presents an ensemble scheme for satirical news identification in Turkish news articles. In the presented scheme, linguistic and psychological feature sets have been utilized to extract the feature sets (i.e. linguistic, psychological, personal, spoken categories, and punctuation). In the classification phase, accuracy rates of five supervised learning algorithms (i.e. naive Bayes algorithm, logistic regression, support vector machines, random forest, and k-nearest neighbor algorithm) with three widely utilized ensemble methods (i.e. AdaBoost, bagging, and random subspace) have been considered. Based on the results, we concluded that the random forest algorithm yielded the highest performance, with a classification accuracy of 96.92% for satire detection in Turkish. For deep learning-based architectures, we have achieved classification accuracy of 97.72% with the recurrent neural network architecture with attention mechanism.

News satire (fake news) is a kind of expression encountered especially in journalism, in which the elements of figurative languages, such as satire, irony, and sarcasm, have been employed [4]. News satire is a pervasive phenomenon on the Web.
Many research studies on natural language processing have been conducted to identify figurative language elements, such as sarcasm and irony [5][6][7]. Compared to the automatic identification schemes for sarcasm and irony, machine learning-based identification of satirical text has been addressed in a few studies [8]. In addition, former research works on satire identification have been concentrated on the English language. In this regard, we present a classifier ensemble approach for satirical news identification in Turkish news articles. We have gathered 17,857 satirical news articles and 17,348 nonsatirical news articles from two distinct sources to examine the performance of machine learning algorithms and feature sets on satirical text detection in terms of predictive efficiency. In the feature extraction, LIWC (Linguistic Inquiry and Word Count) has been utilized [9]. LIWC is exploratory text mining software to extract psychological and linguistic feature sets from text documents. LIWC can be successfully utilized for identification of figurative language on social media platforms [10]. Psycholinguistic feature sets have been already employed in identification of several forms of figurative language, such as humorous text identification [11], sarcasm, and nastiness identification [12]. Features can be obtained in five major dimensions with LIWC software: language processes, psychological procedures, personal issues, spoken issues, and punctuation marks. Five LIWC dimensions and their ensemble combinations were taken into consideration. In the classification phase, accuracy rates of five supervised learning algorithms (i.e. naive Bayes algorithm, logistic regression, support vector machines, random forest, and k-nearest neighbor algorithm) with three widely utilized ensemble methods (i.e. AdaBoost, bagging, and random subspace) have been considered.
In this article, we propose an approach to detect satire in Turkish. The study's primary contributions can be summarized as follows: To the best of our knowledge, it is the first study of satire detection in Turkish based on machine learning. The present system empirically assesses the efficacy of satire detection based on psychological and linguistic feature sets. In addition, we have examined the predictive performance of five deep learning-based architectures in conjunction with conventional word embedding schemes for satire identification.
The rest of this article is organized as follows: in Section 2, we address briefly the related research on automatic identification of figurative language with emphasis on detection of satire. Section 3 provides the study's methodology. We present preprocessing methodology, feature extraction techniques, classification algorithms, and ensemble techniques. Experimental results are presented in Section 4. The last section concludes the paper, providing a summary of the study.

Related work
The automatic identification of figurative language in text documents can be viewed as a challenging task [3].
The previous research on automatic identification of figurative language with emphasis on satire detection is provided in this section.
In [13], Ahmad et al. introduced an approach based on machine learning to filter satirical or real news from journals. In the presented scheme, four feature weighting schemes (namely binary feature weighting, TF-IDF method, binormal separation method, and TF-IDF-BNS) were employed for feature engineering. The experimental procedure, conducted on support vector machines, indicated that the utilization of the TF-IDF method in conjunction with binormal separation can result in promising outcomes for news corpus satire identification. In [14], Barbieri et al. introduced a satirical text identification strategy based on machine learning for advertising news in Spanish. In the presented scheme, several feature engineering schemes (namely frequency of a word of each tweet, ambiguity of words, parts of speech tags, frequency of synonyms, and sentiment orientations) have been taken into account. Empirical findings on separate feature engineering schemes indicated that linguistic feature sets on satirical text classification can produce higher predictive efficiency compared to standard representation schemes, such as bag-of-word representation. In another study, Barbieri et al. [15] evaluated the classification accuracy of machine learning-based schemes for satirical news identification on Twitter in a multilinguistic context, where English, Spanish, and Italian messages were considered. The classification accuracies across different languages have also been examined. Rubin et al. [11] focused on the predictive performance of ensemble feature sets for satirical text detection in another study by taking absurdity, humor, grammar, negative affect, and punctuation into account. The experimental results on the support vector machine algorithm indicated that the ensemble feature sets can provide higher predictive performance in the automatic detection of fake and real news documents. Similarly, for satirical news identification, Perez-Rosas et al. [16] implemented a machine learning strategy. Various feature engineering schemes (i.e. unigrams, bigrams, punctuation marks, psycholinguistic feature sets, readability characteristics, and syntax characteristics) were regarded in the empirical assessment. Similarly, Ravi and Ravi [17] implemented an ensemble classification strategy on customer reviews and news. Linguistic, semantic, psychological, and unigram-based features were extracted. The ensemble classification framework used several supervised learning algorithms, including support vector machines, logistic regression, the random forest algorithm, the naive Bayes algorithm, and multilayer perceptron. The experimental findings indicated that feature sets based on LIWC yielded promising performance in detection of satire and irony. In another work, Ahmed et al. [18] examined the predictive efficiency of various linguistic systems, such as unigram, bigram, trigram, and four-gram models and two weighting systems, i.e. term-frequency and TF-IDF-based representation, in identifying fake and valid online news. In another study, Yang et al. [19] introduced a deep learning-based architecture to detect satirical news. Hierarchical neural networks and linguistic word embedding systems were used in the presented architecture. Recently, Ravi and Ravi [20] presented an ensemble scheme for irony identification, which integrates syntactic, semantic, and psycholinguistic feature sets.
The earlier works on satire identification focused on conventional linguistic and psycholinguistic feature sets in the English language. In the classification phase, conventional classification algorithms, which are support vector machines, naive Bayes, and C4.5, have been used. This paper differs from the earlier literature in the field in terms of several aspects. In this paper, we present a comprehensive analysis on ensemble feature sets obtained by different psycholinguistic feature sets. In addition, conventional learning methods and ensemble learners have been evaluated in the classification phase. The empirical results also report the predictive performance of deep learning-based algorithms on Turkish satire identification. All the empirical analysis has been conducted on Turkish news articles.

Methodology
The research methodology is provided in this section. The presented scheme consists of three phases: collection and preprocessing of the dataset, extraction of features based on LIWC, and classification based on supervised learners and ensemble methods. The general structure of the ensemble scheme based on LIWC features and classifiers has been presented in Figure 1. This section introduces briefly the techniques used in the system.

Dataset collection and preprocessing
In this study, we created a dataset for the detection of the existence of satirical text in Turkish. To do so, we collected Turkish news from two sources where the related news is categorized as satirical and nonsatirical. We gathered the satirical news from the archives of Zaytung, which is a well-known online satirical newspaper in Turkey. On the other hand, to gather the nonsatirical news, we collected tweets of seven newspaper Twitter accounts, which are Milliyet, BBC Turkish, CNNTÜRK, Habertürk, Hürriyet, NTV, and TRTHaber. Table 1 shows the overall number of documents and vocabularies of the collected news based on two categories individually.  To examine the effect of different preprocessing strategies on the corpus, we performed empirical analysis with four different configurations. Initially, we removed all punctuation marks, numeric characters, and extra spaces (denoted by "with preprocessing" in Figure 2). For this scheme, we have not included smileys, emojis, mentions, and hashtags. In addition, we have also examined the predictive performance of classifiers when emojis are added, mentions and hashtags are added, and punctuation marks are added. The average predictive performances (in terms of classification accuracy) of different strategies are summarized in Figure 2. Since Twitter-specific symbols (i.e. mentions and hashtags) may potentially affect the meaning of tweets, they have not been eliminated for the empirical results listed in Section 4. In addition, smileys and emojis have been taken as the feature set. Since punctuation marks and numeric characters may increase or invert the meaning of tweets, punctuation marks have not been eliminated during the preprocessing phase.

Feature extraction
In this research, psycholinguistic sets were obtained using LIWC (Linguistic Inquiry and Word Count). For satirical text identification, the psycholinguistic function subsets were empirically assessed. Psycholinguistic feature sets based on LIWC have already been used in several computational linguistic applications, including sarcasm detection [12], satire identification [9,16], and humorous text detection [21].
LIWC is an application for text analysis to extract emotional, cognitive, and structural properties from samples of verbal and written speech. The first version of LIWC was launched in 1993 and the last version of LIWC, which is called LIWC2015, was published in 2015 [22,23]. LIWC software comprises dictionaries in various languages such as English, Turkish, Spanish, German, Dutch, Norwegian, Italian, and Portuguese. The dictionaries for the other languages were acquired from the English dictionary translations. The LIWC dictionary consists of about 6,400 phrases, stems of phrases, and emoticons. Each dictionary entry corresponds to one or more categories of words or subdictionaries. It is possible to assign each word into one or more categories. Furthermore, categories can be split into five primary aspects (subsets), which are linguistic processes, psychological processes, personal concerns, spoken categories, and punctuation [23]. The LIWC feature sets and categories are provided in Table 2. In Table 3, the traditional LIWC dimension values for satirical and nonsatirical news articles are presented, where traditional LIWC dimensions denote the percentage of total words within the text. In addition, four summary variables (i.e. analytic, clout, authenticity, and emotional tone scores) have been provided. In addition to the basic descriptive information for satirical and nonsatirical news articles, Table 3 contains four instances with the corresponding LIWC values. Regarding the summary variables for news articles on the corpus, the average authenticity values for nonsatirical news are higher, while the emotional tone values are higher for satirical news.

Supervised learning methods
In this study, we used five supervised learning algorithms, the naive Bayes algorithm, logistic regression, support vector machines, random forests, and k-nearest neighbor algorithm, which are applied to assess the predictive efficiency of various function sets on satire detection.

Naive Bayes algorithm
Naive Bayes (NB) is a statistical learning algorithm based on Bayes' rule and conditional independence assumption. The conditional hypothesis of independence considers that, given the class, the attributes are conditionally independent. The assumption simplifies the required computations involved in the algorithm. In response, the algorithm is efficient and can scale well. Though the assumption may not be valid for every real-world case, the naive Bayes algorithm can yield promising results in machine learning tasks, including text classification and sentiment analysis [24,25].  Tatil İçin Türkiye'yi tercih eden ABD'li Richards ailesi, ülkemize bayıldı: "1 ay daha kalırsak oteli satın alabiliyoruz..."

Logistic regression classifier
Logistic regression (LR) [25,26] is a supervised learning algorithm. The algorithm provides a scheme to apply linear regression to classification tasks. It employs a linear regression model and transformed target variables have been utilized to construct a linear classification scheme.

Support vector machines
Support vector machines (SVMs) [25,27] are linear algorithms that can be used for classification and regression.
For the two-class classification tasks, the algorithm finds a hyperplane in the higher-dimensional space such that the instances of two classes have been separated as much as possible. In response, the algorithm has a good generalization ability on newly encountered instances. The algorithm can build suitable learning models in the case of a large amount of data.

Random forest algorithm
Random forest (RF) is a supervised learning algorithm, combining the bagging algorithm and random method of subspace [25,28]. Decision trees were used as the base learning algorithm in this algorithm. Each tree was constructed based on training data bootstrap samples. A random selection of features was used to provide the variety among the base learners. In response, the model can yield promising learning models on datasets with noisy or irrelevant data.

K-nearest neighbor algorithm
K-nearest neighbor (KNN) is a learning algorithm for supervised learning tasks, including classification and regression tasks [26]. In this scheme, the class label for an instance has been determined based on the similarity of the instance to its nearest neighbors in the training set. In this scheme, all the instances have been stored, and at the time of classification, the class label has been identified based on the examination of the k-nearest neighbors.

Ensemble methods
Ensemble learning refers to the process of combing the predictions of multiple supervised learning algorithms and treating the algorithms as a committee of decision makers [25]. Ensemble learning schemes seek to identify a more accurate classification model. In this study, we used the ensembles of the five supervised learning algorithms with three well-known ensemble learning methods, which are AdaBoost, bagging, and random subspace.

AdaBoost algorithm
AdaBoost is an ensemble learning algorithm based on boosting [29]. The base learning algorithms were trained sequentially in the algorithm and at each round a new learning model was built. The weight values allocated to misclassified samples will be increased at each round, while the weight values allocated to properly categorized cases will be reduced. In reaction, the algorithm aims to devote more rounds to cases that are more difficult to learn and to compensate for classification mistakes produced in previous models [25].

Bagging algorithm
Bagging (bootstrap aggregating) [30] is another technique of constructing the ensemble. In this system, from the initial training set by bootstrap sampling, distinct training subsets were acquired. The projections produced by the algorithms of base learning were combined with the use of majority voting [25].

Random subspace algorithm
Random subspace [31] is another technique for constructing the ensemble. In this system, the variety between the ensemble members was accomplished in terms of the partition based on feature space. Each classification algorithm operates on various random subsets of the feature space in this algorithm. The technique thus decreases overfitting while at the same time enhancing predictive efficiency [25].

Experimental results and discussions
In this section, we describe the experimental procedure and the results.

Experimental procedure
In this section, we report experimental findings as the 10-fold average outcomes for each classifier. Five primary LIWC feature sets, namely linguistic processes (LP), psychological processes (PP), private issues (PC), spoken categories (SC), and punctuation marks (PM), have been taken into consideration to extract feature sets. We used feature sets and their ensemble combinations in the empirical analysis. Thirty-one function sets were acquired in this manner. To evaluate the predictive performance of LIWC-based feature sets compared to conventional text representation schemes, evaluation measures were also presented for the unigram model based on term frequency representation. We used five supervised learning algorithms, the naive Bayes algorithm, logistic regression, support vector machines, random forests, and k-nearest neighbor algorithm, and their ensembles with three learning techniques of the ensemble, namely AdaBoost, bagging, and the random subspace algorithm. In this manner, we evaluated 20 distinct algorithms of classification (5 basic learning and 15 ensembles) by using WEKA (Waikato Environment for Knowledge Analysis) version 3.9. In Table 4, the basic parameter sets for the conventional classifiers and ensemble learning methods have been presented.

Experimental results and discussions
In Table 5, the accuracy and AUC (area under ROC curve) values acquired on distinct feature sets by standard supervised learning algorithms are provided. In Table 6, precision, recall, and F-measure values obtained by conventional classifiers are presented.
Regarding the predictive performance values acquired by distinct psychological and linguistic feature sets on satire identification with supervised learning algorithms presented in Table 5, classification accuracies and AUC values of random forest usually outperform the other classification algorithms. Among the five classifiers, generally the naive Bayes algorithm has the second highest predictive performance in terms of accuracy and AUC values. It is followed by support vector machines. The K-nearest neighbor algorithm achieved the lowest predictive output. The experimental evaluation aims to analyze the predictive output of LIWC-based feature sets (i.e. linguistic processes, psychological procedures, private issues, spoken categories, and punctuation) and their ensembles as feature sets for Turkish satire identification. Besides LIWC-based function sets, classification and AUC values for standard text representation schemes have been given.
As can be seen from the empirical outcomes described in Table 5 and Table 6, the use of LIWC-based feature sets outperforms the term frequency for satire identification with the unigram model (referred to as TF). With respect to the distinct feature subsets of LIWC-based sets, linguistic processes (referred to as LP) have accomplished the greatest predictive efficiency. Psychological process-based characteristics acquired the second largest predictive output, and private concerns-based feature sets reached the third largest predictive output. With respect to the predictive performance of individual feature sets, punctuation-based feature sets have obtained the lowest predictive performance. Therefore, the experimental evaluation of LIWC-based feature sets indicates that the use of characteristics, such as word count, complete pronouns, personal pronouns, articles, prepositions, etc. can be more informative to identify satirical text. We achieved the highest predictive performance among the five individual feature sets with a classification accuracy of 89.21 percent. In this system, we used language procedures in combination with random forest algorithms as function sets.
We examined the LIWC-based ensemble feature sets for Turkish satire detection as the second concern of the study. Table 5 and Table 6 present the use of ensemble feature sets combining different LIWC-based feature to provide better predictive performance compared to individually taken LIWC-based feature sets. The ensemble feature set, which integrates five LIWC-based feature sets, namely linguistic procedures, psychological procedures, private issues, spoken categories, and punctuation, has acquired the highest classification precision among the 31 feature sets considered. Using this set function in combination with the random forest algorithm results in a 93.21 percent classification and a 0.94 F-measure value.
To improve the predictive output of standard supervised learning techniques, the paradigm of ensemble learning can be used. The empirical analysis' third issue is to define whether or not ensemble learners can attain better predictive output for satire detection. In this respect, we took into consideration three wellknown learners of the ensemble (i.e. bagging (B), AdaBoost (A), and random forest (RS)). Tables 7-11 provide predictive performance values for ensemble learners in combination with standard learners and ensembles. As can be seen from the outcomes of predictive performance summarized in Tables 7-11, evaluation metric values  Table 6. Precision, recall, and F-measure of conventional classification algorithms.

Precision
Recall F-measure values  Feature set  KNN LR  SVM NB  RF  KNN LR  SVM NB  RF  KNN LR  SVM NB  RF  TF  Regarding the results of distinct ensemble learning techniques, the random subspace algorithm usually accomplished the best predictive performance. The AdaBoost algorithm usually had the second best predictive performance, and the bagging algorithm usually was third. The greatest predictive achievement among all comparative settings was achieved through the ensemble feature set that integrates five LIWC-based feature sets, namely language procedures, psychological procedures, private issues, spoken categories, and punctuation, in combination with the random subspace ensemble of random forest. A classification accuracy of 96.92 percent was achieved for this setup. The findings of predictive performance mentioned in Tables 8-11 show comparable trends to the outcomes of performance shown in Table 7.  To summarize the primary results of the empirical analysis, we describe in Figure 3 and Figure 4, respectively, the main effect plots for precision and main effect plots for F-measure values.
In addition to the predictive performance of conventional classifiers, ensemble learners, and feature sets, we have also considered the predictive performance of deep learning architectures on satire identification in Turkish. In Table 12, classification accuracies and AUC values obtained by five deep learning architectures Table 9. Precision values of ensemble learning algorithms. (convolutional neural network, recurrent neural network, long short-term memory, gated recurrent unit, and recurrent neural network with attention mechanism) on three word-embedding schemes have been presented.
The predictive performance results listed in Tables 12 and 13 indicate that the GloVe word embedding scheme yields higher predictive performance compared to the other word embedding schemes. The second highest predictive performances have been obtained by the fastText CBOW model, which is followed by the  Regarding the predictive performances of deep-learning-based architectures for sentiment analysis, the highest predictive performances have been obtained by the recurrent neural network with attention mechanism Table 11. AUC values of ensemble learning algorithms. To further evaluate the empirical results, we have conducted a two-way ANOVA test on the Minitab statistical toolkit. The findings of the ANOVA test are provided in Table 14. In this table, respectively,  DF, SS, MS, F, and P denote degrees of freedom, adjusted square amount, adjusted mean square value, fvalue, and probability value. There are statistically significant variations between the predictive performance of comparative feature sets and classification algorithms (P <0.0001) according to two-way ANOVA test outcomes.

Conclusion
Figurative language, which contains elements such as metaphor, analogy, ambiguity, irony, and satire, can be frequently encountered on social media and microblogging platforms. The automatic identification of figurative language can be regarded as a challenging task in natural language processing. In this paper, we present a machine learning-based approach for satirical text identification in Turkish. The contributions of this paper are threefold. First, we have collected a corpus of news articles in Turkish for satire identification. Second, we have performed an extensive empirical analysis of different psychological and linguistic feature set dimensions. Third, we have experimented on the use of ensemble learning methods in conjunction with ensemble feature subsets and classifiers. The comprehensive analysis on different feature sets, classifiers, and ensemble methods indicated that the utilization of psychological and linguistic feature sets in conjunction with ensemble learners can yield encouraging predictive results for satire detection in Turkish. We have achieved a classification accuracy of 96.92% with the use of ensemble feature subsets and the random subspace ensemble of the random forest algorithm.
In addition to conventional classifiers and feature sets, deep-learning architectures have been also evaluated for satire identification. The experimental results indicate that deep-learning-based architectures can yield promising results on satire detection. We have obtained a classification accuracy of 97.72% with the recurrent neural network architecture with attention mechanism with the use of the GloVe-based word embedding scheme.
It should be beneficial to extend this study in several dimensions. First, we consider extending the feature subsets by taking other linguistic feature subsets into account. The experimental analysis indicates that deep-learning methods can yield higher predictive performances on satire identification. Hence, it should be beneficial to extend this study by proposing a novel deep-learning-based architecture. In addition, the predictive performance of the empirical schemes can be adopted to other languages in order to draw conclusions regarding different and/or similar patterns in different languages.