Research Article
BibTex RIS Cite
Year 2023, , 96 - 107, 29.10.2023
https://doi.org/10.54569/aair.1329048

Abstract

References

  • Ahmed H, Traore I, Saad S. “Detecting opinion spams and fake news using text classification”, Security and Privacy 1.1 (2018) : e9.
  • Bengio Y. “Learning deep architectures for AI”, Foundations and trends® in Machine Learning 2.1 (2009): 1-127.
  • Algur SP, Patil AP, Hiremath PS, Shivashankar S. “Conceptual level similarity measure-based review spam detection”, International Conference on Signal and Image Processing, pp. 416-423. IEEE, 2010.
  • Lau RY, Liao SY, Kwok RC, Xu K, Xia Y, Li Y. “Text mining and probabilistic language modeling for online review spam detection”, ACM Transactions on Management Information Systems (TMIS) 2, no. 4: 1-30, 2012.
  • Jindal Nitin, Bing Liu. “Opinion spam and analysis”, In Proceedings of the international conference on web search and data mining, pp. 219-230, 2008.
  • Choi Wonil, Kyungmin Nam, Minwoo Park, Seoyi Yang, Sangyoon Hwang, Hayoung Oh. “Fake review identification and utility evaluation model using machine learning”, Frontiers in artificial intelligence 5: 1064371, 2023.
  • Yu AW, Dohan D, Luong MT, Zhao R, Chen K, Norouzi M, Le QV. “Qanet: Combining local convolution with global self-attention for reading comprehension”, 2018. CoRR aba/1804.09541. URL: https://arxiv.org/pdf/1804.09541.
  • Kobayashi. “Contextual augmentation: Data augmentation by words with paradigmatic relations”, In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), Association for Computational Linguistics, New Orleans, Louisiana, pp. 452–457, 2018.
  • Xie Z, Wang SI, Li J, Lévy D, Nie A, Jurafsky D, Ng AY. “Data noising as smoothing in neural network language models”, In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017.
  • LeBaron B, Weigend AS. “A bootstrap evaluation of the effect of data splitting on financial time series”, IEEE Transactions on Neural Networks 9.1 (1998): 213-220.
  • Coates A, Ng A, Lee H. “An analysis of single-layer networks in unsupervised feature learning”, Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011.
  • Cunningham P, Carney J, Jacob S. “Stability problems with artificial neural networks and the ensemble solution”, Artificial Intelligence in medicine 20.3 (2000): 217-225.
  • Dolgikh S. “Identifying explosive epidemiological cases with unsupervised machine learning”, medRxiv (2020): 2020-05.
  • Hornik K, Stinchcombe M, White H. “Multilayer feedforward networks are universal approximators”, Neural networks 2.5 (1989): 359-366.
  • Izonin I, Tkachenko R, Dronyuk I, Tkachenko P, Gregus M, and Rashkevych M. “Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method”, Mathematical Biosciences and Engineering 18.3 (2021): 2599-2613.
  • Karar ME. “Robust RBF neural network–based backstepping controller for implantable cardiac pacemakers”, International Journal of Adaptive Control and Signal Processing 32.7 (2018): 1040-1051.
  • Ott M, Choi Y, Cardie C, Hancock JT. “Finding deceptive opinion spam by any stretch of the imagination”, arXiv preprint arXiv:1107.4557 (2011).
  • Prystavka P, Cholyshkina O, Dolgikh S, Karpenko D. “Automated object recognition system based on convolutional autoencoder”, In 2020 10th international conference on advanced computer information technologies (ACIT). IEEE, 2020.
  • Corona Rodriguez R, Alaniz S, Akata Z. “Modeling conceptual understanding in image reference games”, Advances in Neural Information Processing Systems 32 (2019).
  • Li J, Ott M, Cardie C, Hovy E. “Towards a general rule for identifying deceptive opinion spam”, In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1566-1576, 2014.
  • Salah I, Jouini K, Korbaa O. “Augmentation-based ensemble learning for stance and fake news detection”, In Advances in Computational Collective Intelligence – 14th International Conference, ICCCI 2022, Proceedings of Communications in Computer and Information Science (Vol. 1653, pp. 29–41). 2022.
  • Xie Q, Dai Z, Hovy E, Luong T, Le Q. “Unsupervised data augmentation for consistency training”, Advances in neural information processing systems 33, pp. 6256-6268, 2020.
  • Shorten C, Khoshgoftaar TM, Furht B. “Text data augmentation for deep learning”, Journal of Big Data, 8(1), 1–34, 2021.
  • Min J, McCoy RT, Das D, Pitler E, Linzen T. “Syntactic data augmentation increases robustness to inference heuristics”, In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2339–2352, 2020.
  • Huang L, Wu L, Wang L. “Knowledge graph-augmented abstractive summarization with semantic-driven cloze reward”, In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5094–510, 2020.
  • Glavaš G, Vulić I. “Is supervised syntactic parsing beneficial for language understanding tasks? An empirical investigation”, In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 3090–3104, 2021.
  • Li MM, Huang K, Zitnik M. “Representation learning for networks in biology and medicine: advancements, challenges, and opportunities”, arXiv preprint arXiv:2104.04883 (2021).
  • Zhao T, Liu Y, Neves L, Woodford O, Jiang M, Shah N. Data augmentation for graph neural networks. In Proceedings of the AAAI conference on artificial intelligence 2021 May 18 (Vol. 35, No. 12, pp. 11015-11023).
  • [29] Kong K, Li G, Ding M, Wu Z, Zhu C, Ghanem B, Taylor G, Goldstein T. “FLAG: adversarial data augmentation for graph neural networks”, arXiv:2010.09891 (2020).
  • Devlin J, Chang MW, Lee K, Toutanova K. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, In Proceedings of NAACL-HLT 2019 Jun 2 (Vol. 1, p. 2).
  • Ester M, Kriegel HP, Sander J, Xu X. “A density-based algorithm for discovering clusters in large spatial databases with noise”, In KDD, vol. 96, no. 34, pp. 226-231. 1996.
  • Forman, George, Ira Cohen. “Learning from little: Comparison of classifiers given little training”, In European Conference on Principles of Data Mining and Knowledge Discovery. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004.
  • Fischer A, Igel C. “Training restricted Boltzmann machines: An introduction”, Pattern Recognition 47.1 (2014): 25-39.
  • Hekler EB, Klasnja P, Chevance G, Golaszewski NM, Lewis D, Sim I. “Why we need a small data paradigm”, BMC medicine 17.1 (2019): 1-9.
  • Mukherjee A, Liu B, Glance N. “Spotting fake reviewer groups in consumer reviews”, In Proceedings of the 21st international conference on World Wide Web, pp. 191-200, 2012.
  • Shojaee S, Murad MA, Azman AB, Sharef NM, Nadali S. “Detecting deceptive reviews using lexical and syntactic features”, In 2013 13th International Conference on Intellient Systems Design and Applications, pp. 53-58. IEEE, 2013.
  • Sanh V, Debut L, Chaumond J, Wolf T. “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter”, arXiv preprint arXiv:1910.01108 (2019).
  • Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. “RoBERTa: A Robustly Optimized BERT Pretraining Approach”, arXiv preprint arXiv:1907.11692 (2019).
  • Clark K, Luong MT, Le QV, Manning CD. “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators”, arXiv preprint arXiv:2003.10555 (2020).
  • Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J. “LSTM: A search space odyssey”, IEEE transactions on neural networks and learning systems 28, no. 10: 2222-2232, 2016.

Analyzing the Impact of Augmentation Techniques on Deep Learning Models for Deceptive Review Detection: A Comparative Study

Year 2023, , 96 - 107, 29.10.2023
https://doi.org/10.54569/aair.1329048

Abstract

Deep Learning has brought forth captivating applications, and among them, Natural Language Processing (NLP) stands out. This study delves into the role of the data augmentation training strategy in advancing NLP. Data augmentation involves the creation of synthetic training data through transformations, and it is a well-explored research area across various machine learning domains. Apart from enhancing a model's generalization capabilities, data augmentation addresses a wide range of challenges, such as limited training data, regularization of the learning objective, and privacy protection by limiting data usage. The objective of this study is to investigate how data augmentation improves model accuracy and precise predictions, specifically using deep learning-based models. Furthermore, the study conducts a comparative analysis between deep learning models without data augmentation and those with data augmentation.

References

  • Ahmed H, Traore I, Saad S. “Detecting opinion spams and fake news using text classification”, Security and Privacy 1.1 (2018) : e9.
  • Bengio Y. “Learning deep architectures for AI”, Foundations and trends® in Machine Learning 2.1 (2009): 1-127.
  • Algur SP, Patil AP, Hiremath PS, Shivashankar S. “Conceptual level similarity measure-based review spam detection”, International Conference on Signal and Image Processing, pp. 416-423. IEEE, 2010.
  • Lau RY, Liao SY, Kwok RC, Xu K, Xia Y, Li Y. “Text mining and probabilistic language modeling for online review spam detection”, ACM Transactions on Management Information Systems (TMIS) 2, no. 4: 1-30, 2012.
  • Jindal Nitin, Bing Liu. “Opinion spam and analysis”, In Proceedings of the international conference on web search and data mining, pp. 219-230, 2008.
  • Choi Wonil, Kyungmin Nam, Minwoo Park, Seoyi Yang, Sangyoon Hwang, Hayoung Oh. “Fake review identification and utility evaluation model using machine learning”, Frontiers in artificial intelligence 5: 1064371, 2023.
  • Yu AW, Dohan D, Luong MT, Zhao R, Chen K, Norouzi M, Le QV. “Qanet: Combining local convolution with global self-attention for reading comprehension”, 2018. CoRR aba/1804.09541. URL: https://arxiv.org/pdf/1804.09541.
  • Kobayashi. “Contextual augmentation: Data augmentation by words with paradigmatic relations”, In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), Association for Computational Linguistics, New Orleans, Louisiana, pp. 452–457, 2018.
  • Xie Z, Wang SI, Li J, Lévy D, Nie A, Jurafsky D, Ng AY. “Data noising as smoothing in neural network language models”, In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017.
  • LeBaron B, Weigend AS. “A bootstrap evaluation of the effect of data splitting on financial time series”, IEEE Transactions on Neural Networks 9.1 (1998): 213-220.
  • Coates A, Ng A, Lee H. “An analysis of single-layer networks in unsupervised feature learning”, Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011.
  • Cunningham P, Carney J, Jacob S. “Stability problems with artificial neural networks and the ensemble solution”, Artificial Intelligence in medicine 20.3 (2000): 217-225.
  • Dolgikh S. “Identifying explosive epidemiological cases with unsupervised machine learning”, medRxiv (2020): 2020-05.
  • Hornik K, Stinchcombe M, White H. “Multilayer feedforward networks are universal approximators”, Neural networks 2.5 (1989): 359-366.
  • Izonin I, Tkachenko R, Dronyuk I, Tkachenko P, Gregus M, and Rashkevych M. “Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method”, Mathematical Biosciences and Engineering 18.3 (2021): 2599-2613.
  • Karar ME. “Robust RBF neural network–based backstepping controller for implantable cardiac pacemakers”, International Journal of Adaptive Control and Signal Processing 32.7 (2018): 1040-1051.
  • Ott M, Choi Y, Cardie C, Hancock JT. “Finding deceptive opinion spam by any stretch of the imagination”, arXiv preprint arXiv:1107.4557 (2011).
  • Prystavka P, Cholyshkina O, Dolgikh S, Karpenko D. “Automated object recognition system based on convolutional autoencoder”, In 2020 10th international conference on advanced computer information technologies (ACIT). IEEE, 2020.
  • Corona Rodriguez R, Alaniz S, Akata Z. “Modeling conceptual understanding in image reference games”, Advances in Neural Information Processing Systems 32 (2019).
  • Li J, Ott M, Cardie C, Hovy E. “Towards a general rule for identifying deceptive opinion spam”, In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1566-1576, 2014.
  • Salah I, Jouini K, Korbaa O. “Augmentation-based ensemble learning for stance and fake news detection”, In Advances in Computational Collective Intelligence – 14th International Conference, ICCCI 2022, Proceedings of Communications in Computer and Information Science (Vol. 1653, pp. 29–41). 2022.
  • Xie Q, Dai Z, Hovy E, Luong T, Le Q. “Unsupervised data augmentation for consistency training”, Advances in neural information processing systems 33, pp. 6256-6268, 2020.
  • Shorten C, Khoshgoftaar TM, Furht B. “Text data augmentation for deep learning”, Journal of Big Data, 8(1), 1–34, 2021.
  • Min J, McCoy RT, Das D, Pitler E, Linzen T. “Syntactic data augmentation increases robustness to inference heuristics”, In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2339–2352, 2020.
  • Huang L, Wu L, Wang L. “Knowledge graph-augmented abstractive summarization with semantic-driven cloze reward”, In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5094–510, 2020.
  • Glavaš G, Vulić I. “Is supervised syntactic parsing beneficial for language understanding tasks? An empirical investigation”, In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 3090–3104, 2021.
  • Li MM, Huang K, Zitnik M. “Representation learning for networks in biology and medicine: advancements, challenges, and opportunities”, arXiv preprint arXiv:2104.04883 (2021).
  • Zhao T, Liu Y, Neves L, Woodford O, Jiang M, Shah N. Data augmentation for graph neural networks. In Proceedings of the AAAI conference on artificial intelligence 2021 May 18 (Vol. 35, No. 12, pp. 11015-11023).
  • [29] Kong K, Li G, Ding M, Wu Z, Zhu C, Ghanem B, Taylor G, Goldstein T. “FLAG: adversarial data augmentation for graph neural networks”, arXiv:2010.09891 (2020).
  • Devlin J, Chang MW, Lee K, Toutanova K. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, In Proceedings of NAACL-HLT 2019 Jun 2 (Vol. 1, p. 2).
  • Ester M, Kriegel HP, Sander J, Xu X. “A density-based algorithm for discovering clusters in large spatial databases with noise”, In KDD, vol. 96, no. 34, pp. 226-231. 1996.
  • Forman, George, Ira Cohen. “Learning from little: Comparison of classifiers given little training”, In European Conference on Principles of Data Mining and Knowledge Discovery. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004.
  • Fischer A, Igel C. “Training restricted Boltzmann machines: An introduction”, Pattern Recognition 47.1 (2014): 25-39.
  • Hekler EB, Klasnja P, Chevance G, Golaszewski NM, Lewis D, Sim I. “Why we need a small data paradigm”, BMC medicine 17.1 (2019): 1-9.
  • Mukherjee A, Liu B, Glance N. “Spotting fake reviewer groups in consumer reviews”, In Proceedings of the 21st international conference on World Wide Web, pp. 191-200, 2012.
  • Shojaee S, Murad MA, Azman AB, Sharef NM, Nadali S. “Detecting deceptive reviews using lexical and syntactic features”, In 2013 13th International Conference on Intellient Systems Design and Applications, pp. 53-58. IEEE, 2013.
  • Sanh V, Debut L, Chaumond J, Wolf T. “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter”, arXiv preprint arXiv:1910.01108 (2019).
  • Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. “RoBERTa: A Robustly Optimized BERT Pretraining Approach”, arXiv preprint arXiv:1907.11692 (2019).
  • Clark K, Luong MT, Le QV, Manning CD. “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators”, arXiv preprint arXiv:2003.10555 (2020).
  • Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J. “LSTM: A search space odyssey”, IEEE transactions on neural networks and learning systems 28, no. 10: 2222-2232, 2016.
There are 40 citations in total.

Details

Primary Language English
Subjects Natural Language Processing
Journal Section Research Articles
Authors

Anusuya Krishnan 0009-0005-6932-4147

Kennedyraj Mariafrancis This is me 0009-0001-2481-0943

Early Pub Date October 23, 2023
Publication Date October 29, 2023
Acceptance Date October 18, 2023
Published in Issue Year 2023

Cite

IEEE A. Krishnan and K. Mariafrancis, “Analyzing the Impact of Augmentation Techniques on Deep Learning Models for Deceptive Review Detection: A Comparative Study”, Adv. Artif. Intell. Res., vol. 3, no. 2, pp. 96–107, 2023, doi: 10.54569/aair.1329048.

88x31.png
Advances in Artificial Intelligence Research is an open access journal which means that the content is freely available without charge to the user or his/her institution. All papers are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows users to distribute, remix, adapt, and build upon the material in any medium or format for non-commercial purposes only, and only so long as attribution is given to the creator.

Graphic design @ Özden Işıktaş