Improving Hotel Review Rating Prediction with Transformer Models
Abstract
Online review platforms have become crucial decision-making tools in the hospitality industry, where automated sentiment analysis and rating prediction offer valuable insights for both businesses and consumers. This study investigates the performance of transformer-based language models for predicting hotel review ratings and examines the impact of oversampling techniques on model accuracy. We introduce a novel dataset of 68,785 English hotel reviews from TripAdvisor (2014-2023) in Turkey. Four transformer models, i.e., BERT, DistilBERT, RoBERTa, and DeBERTa, were systematically compared using multiple perspectives. Results show DeBERTa achieves the highest performance among all evaluated models. Random oversampling (ROS) significantly improved classification performance, with F1-scores increasing from 62% to 81% and accuracy from 76% to over 82% across all models. The oversampling approach effectively addressed class imbalance while preserving semantic information, enabling better distinction between rating categories. Through quantitative and qualitative analysis, including the embedding of visualization and SHAP-based interpretability studies, we demonstrate that transformer models effectively capture sentiment patterns. However, they remain sensitive to mixed sentiments and linguistic subtleties. This work contributes a novel dataset, a systematic comparison of four transformer models, and empirical evidence of oversampling effectiveness in sentiment analysis.
Keywords
References
- O. Ciftci, K. Berezina, M. Cavusoglu, and C. Cobanoglu, “Winning the battle: The importance of price and online reviews for hotel selection,” Adv. Hospitality Tourism Res., vol. 8, no. 1, pp. 177–202, Jun. 2020, doi: 10.30519/ahtr.528150.
- M. Suwal, P. Neupane, and G. D. Pant, “Online review on hotel booking decision: Consumer view,” Int. J. Atharva, vol. 3, no. 1, pp. 133–150, Mar. 2025, doi: 10.3126/ija.v3i1.76724.
- P. S. Ghatora, S. E. Hosseini, S. Pervez, M. J. Iqbal, and N. Shaukat, “Sentiment analysis of product reviews using machine learning and pre-trained LLM,” Big Data Cogn. Comput., vol. 8, no. 12, Art. no. 199, Dec. 2024, doi: 10.3390/bdcc8120199.
- N. Malik and M. Bilal, “Natural language processing for analyzing online customer reviews: A survey, taxonomy, and open research challenges,” PeerJ Comput. Sci., vol. 10, Art. no. e2203, Aug. 2024, doi: 10.20944/preprints202312.2210.v1.
- J. Hartmann, M. Heitmann, C. Siebert, and C. Schamp, “More than a feeling: Accuracy and application of sentiment analysis,” Int. J. Res. Marketing, vol. 40, no. 1, pp. 75–87, Mar. 2023, doi: 10.1016/j.ijresmar.2022.05.005.
- R. Obiedat et al., “Sentiment analysis of customers’ reviews using a hybrid evolutionary SVM-based approach in an imbalanced data distribution,” IEEE Access, vol. 10, pp. 22260–22273, Mar. 2022, doi: 10.1109/ACCESS.2022.3149482.
- W. Zhou, Y. Wang, Y. Qu, and L. Li, “Automating app review classification based on extended semantic,” in Proc. 9th Int. Conf. Dependable Syst. Appl. (DSA), Aug. 2022, pp. 106–115, doi: 10.1109/DSA56465.2022.00022.
- Y. C. A. P. Reddy, S. P. P. Sagar, R. P. Kalyan, and N. S. Charan, “Classification of hotel reviews using machine learning techniques,” in Proc. 8th Int. Conf. Smart Struct. Syst. (ICSSS), Apr. 2022, pp. 1–5, doi: 10.1109/ICSSS54381.2022.9782215.
Details
Primary Language
English
Subjects
Computer Software
Journal Section
Research Article
Early Pub Date
June 1, 2026
Publication Date
June 17, 2026
Submission Date
July 23, 2025
Acceptance Date
November 28, 2025
Published in Issue
Year 2026 Volume: 9 Number: 2
