The rise of social media has amplified online bullying, particularly in code-mixed languages such as Hinglish, where detecting harmful content remains challenging due to linguistic complexity and nuanced expressions. We propose QBERTox, a novel model for classifying Hinglish social media text as bullying or non-bullying, integrating a quantum-inspired layer with explainable AI techniques. Built on a fine tuned BERT architecture, QBERTox incorporates cyberbully-specific features, including toxicity scores from the Detoxify model and sentiment analysis, to capture semantic nuances. The quantum-inspired layer, implemented as a variational quantum circuit with 8 qubits, enhances feature entanglement for improved detection of complex linguistic patterns, outperforming classical BERT by 2.3% in the F1-score on a Hinglish dataset. Our dataset, comprising 6,432 annotated Hinglish tweets (46% non-bullying, 54% bullying). QBERTox achieves 85% accuracy and 0.85 F1-score and surpasses baselines. Explainability is ensured through LIME and SHAP, providing interpretable feature importance for bullying predictions. QBERTox offers a scalable, trustworthy solution for combating cyberbullying in multilingual contexts, with guidelines for platform integration and moderator training.
CCS Concepts: • Do Not Use This Code → Generate the Correct Terms for Your Paper; Generate the Correct Terms for Your Paper; Generate the Correct Terms for Your Paper; Generate the Correct Terms for Your Paper.
| Primary Language | English |
|---|---|
| Subjects | Natural Language Processing |
| Journal Section | Research Article |
| Authors | |
| Submission Date | June 27, 2025 |
| Acceptance Date | July 13, 2025 |
| Publication Date | July 28, 2025 |
| Published in Issue | Year 2025 Volume: 1 Issue: 2 |