Abstract
In this paper, an effective tweet classification system that fully supports the Turkish language has been developed. The proposed system can be used for mining (classifying) the recently published and publicly available tweets to find the crisis’s most related and useful tweets to gain situational awareness, which can help in taking the correct responses in order to prevent or at least decrease the effect of such situations. A deep study was carried out to improve and optimize the proposed system. In more detail, some intensive experiments were performed to investigate the performance of some well-known machine learning algorithms, i.e., K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Naive Bayes (NB) when used for text (tweets) classification. Then, the performances of the ensemble systems of the studied algorithms and the Random Forest (RF), AdaBoost Classifier (AdaBoost), GradientBoosting Classifier (GBC) ensemble systems have also been observed. As shown in the experimental evaluation and analysis, the proposed approach has stability, robustness, and can achieve quite good performance when processing the Turkish language. The performance of the proposed classifier was also compared with two state-of-the-art text classification approaches, i.e., "Empirical" and “Turkish Deep ".