This study aims to develop and evaluate a machine learning (ML)-based classification model for distinguishing between texts generated by artificial intelligence (AI) and those written by humans. Utilizing a comprehensive dataset comprising 487235 text samples, various ML algorithms—including Multilayer Perceptron (MLP), Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR), Support Vector Machines (SVM), Decision Trees (DT), and an Ensemble Model—were trained and evaluated to classify AI-generated and human-generated texts. Ensemble Model, which combines the best-performing algorithms, achieved an accuracy rate of 99.90%, outperforming individual models. Additionally, the study presents a user-friendly interface that enables real-time classification of texts using the weights of the ensemble model. This interface holds potential as a practical tool for researchers and professionals in fields such as education, academia, and media. The model's generalization capability was also tested on a user-generated dataset through the user interface, and it was found to be consistent with the primary dataset, achieving an "Almost Perfect" level according to the Kappa statistic. This study highlights the necessity of robust tools to mitigate ethical and security risks associated with AI-generated content. Moreover, ensemble models show great promise in handling complex classification tasks.
Natural Language Processing Artificial Intelligence and Ethics Machine Learning Ensemble Models Text Classification
| Primary Language | English |
|---|---|
| Subjects | Natural Language Processing |
| Journal Section | Research Article |
| Authors | |
| Submission Date | October 4, 2025 |
| Acceptance Date | February 9, 2026 |
| Publication Date | March 24, 2026 |
| DOI | https://doi.org/10.17798/bitlisfen.1796956 |
| IZ | https://izlik.org/JA57CX47YY |
| Published in Issue | Year 2026 Volume: 15 Issue: 1 |