Araştırma Makalesi
BibTex RIS Kaynak Göster

The Battle of Chatbot Giants: An Experimental Comparison of ChatGPT and Bard

Yıl 2024, Cilt: 16 Sayı: 2, 679 - 691, 30.06.2024
https://doi.org/10.29137/umagd.1390083

Öz

Nowadays, it is hard to find a part of human life that Artificial Intelligence (AI) has not been involved in. With the recent advances in AI, the change for chatbots has been an ‘evolution’ instead of a ‘revolution’. AI-powered chatbots have become an integral part of customer services as they are as functional as humans (if not more), and they can provide 24/7 service (unlike humans). There are several publicly available, widely used AI-powered chatbots. So, “Which one is better?” is a question that instinctively comes to mind and needs to be shed light on. Motivated by the question, an experimental comparison of two widely used AI-powered chatbots, namely ChatGPT and Bard, was proposed in this study. For a quantitative comparison, (i) a gold standard QA dataset, which comprised 2.390 questions from 109 topics, was used, and (ii) a novel answer-scoring algorithm was proposed. The covered chatbots were evaluated using the proposed algorithm on the dataset to reveal their (i) generated answer length, and (ii) generated answer accuracy. According to the experimental results, (i) Bard generated lengthy answers compared to ChatGPT, and (ii) Bard provided answers more similar to the ground truth compared to ChatGPT.

Kaynakça

  • Ali, R., Tang, O. Y., Connolly, I. D., Fridley, J. S., Shin, J. H., Sullivan, P. L. Z., Cielo, D., Oyelese, A. A., Doberstein, C. E., Telfeian, A. E., Gokaslan, Z. L., & Asaad, W. F. (2023). Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank. MedRxiv, 2023.04.06.23288265, 1–23. https://doi.org/10.1101/2023.04.06.23288265
  • Anand, Y., Nussbaum, Z., Duderstadt, B., Schmidt, B., & Mulyar, A. (2024). GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo. GitHub. https://github.com/nomic-ai/gpt4all
  • Ariyaratne, S., Iyengar, K. P., Nischal, N., Chitti Babu, N., & Botchu, R. (2023). A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiology, 52, 1755–1758. https://doi.org/10.1007/s00256-023-04340-5
  • Au Yeung, J., Kraljevic, Z., Luintel, A., Balston, A., Idowu, E., Dobson, R. J., & Teo, J. T. (2023). AI chatbots not yet ready for clinical use. Frontiers in Digital Health, 5, 1–5. https://doi.org/10.3389/fdgth.2023.1161098
  • Bernardini, A. A., Sônego, A. A., & Pozzebon, E. (2018). Chatbots: An Analysis of the State of Art of Literature. Proceedings of the 1st Workshop on Advanced Virtual Environments and Education (WAVE2 2018). https://doi.org/10.5753/wave.2018.1
  • Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit (1st ed.). O’Reilly Media.
  • Caldarini, G., Jaf, S., & McGarry, K. (2022). A Literature Survey of Recent Advances in Chatbots. Information, 13(1), 1–22. https://doi.org/10.3390/info13010041
  • ChatGPT. (2024). OpenAI. https://chat.openai.com
  • ChatGPT, Bard, Microsoft Copilot - Explore - Google Trends. (2024). Google Trends. https://trends.google.com/trends/explore?date=today%203-m&q=/g/11khcfz0y2,/g/11ts49p01g,/g/11tsqm45vd&hl=en
  • Cheong, A. (2024). Python SDK/API for reverse engineered Google Bard. GitHub. https://github.com/acheong08/Bard
  • Cheung, B. H. H., Lau, G. K. K., Wong, G. T. C., Lee, E. Y. P., Kulkarni, D., Seow, C. S., Wong, R., & Co, M. T. H. (2023). ChatGPT versus human in generating medical graduate exam multiple choice questions—A multinational prospective study (Hong Kong S. A.R., Singapore, Ireland, and the United Kingdom). PLoS ONE, 18(8), 1–12. https://doi.org/10.1371/journal.pone.0290691
  • Erica - Virtual Financial Assistant From Bank of America. (2024). Bank of America. https://promotions.bankofamerica.com/digitalbanking/mobilebanking/erica
  • Garfinkle, A. (2023). ChatGPT on track to surpass 100 million users faster than TikTok or Instagram: UBS. Yahoo Finance. https://finance.yahoo.com/news/chatgpt-on-track-to-surpass-100-million-users-faster-than-tiktok-or-instagram-ubs-214423357.html
  • Gemini Team. (2023). Gemini: A Family of Highly Capable Multimodal Models. ArXiv, 2312.11805, 1–62.
  • Google Trends. (2024). Google. https://trends.google.com
  • Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J., & Wu, Y. (2023). How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. ArXiv, 2301.07597, 1–20.
  • Herbold, S., Hautli-Janisz, A., Heuer, U., Kikteva, Z., & Trautsch, A. (2023). A large-scale comparison of human-written versus ChatGPT-generated essays. Scientific Reports, 13, 1–13. https://doi.org/10.1038/s41598-023-45644-9
  • Hristidis, V., Ruggiano, N., Brown, E. L., Ganta, S. R. R., & Stewart, S. (2023). ChatGPT vs Google for Queries Related to Dementia and Other Cognitive Decline: Comparison of Results. Journal of Medical Internet Research, 25, 1–13. https://doi.org/10.2196/48966
  • Hu, K. (2023). ChatGPT sets record for fastest-growing user base - analyst note. Reuters. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/
  • Hulman, A., Dollerup, O. L., Mortensen, J. F., Fenech, M. E., Norman, K., Støvring, H., & Hansen, T. K. (2023). ChatGPT- versus human-generated answers to frequently asked questions about diabetes: A Turing test-inspired survey among employees of a Danish diabetes center. PLoS ONE, 18(8), 1–10. https://doi.org/10.1371/journal.pone.0290773
  • Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science and Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55
  • Introducing UoA Assistant. (2021). University of Auckland. https://www.auckland.ac.nz/en/on-campus/life-on-campus/latest-student- news/student-services-function-review/introducing-uoa-assistant.html
  • Li, S. W., Kemp, M. W., Logan, S. J. S., Dimri, P. S., Singh, N., Mattar, C. N. Z., Dashraath, P., Ramlal, H., Mahyuddin, A. P., Kanayan, S., Carter, S. W. D., Thain, S. P. T., Fee, E. L., Illanes, S. E., Choolani, M. A., Rauff, M., Biswas, A., Low, J. J. H., Ng, J. S., … Lim, M. Y. (2023). ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology. American Journal of Obstetrics and Gynecology, 229(2), 1–12. https://doi.org/10.1016/j.ajog.2023.04.020
  • Liu, J., Tang, X., Li, L., Chen, P., & Liu, Y. (2023). ChatGPT vs. Stack Overflow: An Exploratory Comparison of Programming Assistance Tools. 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C 2023), 364–373. https://doi.org/10.1109/QRS-C60940.2023.00105
  • Microsoft Copilot. (2024). Microsoft. https://copilot.microsoft.com
  • Nirala, K. K., Singh, N. K., & Purani, V. S. (2022). A survey on providing customer and public administration based services using AI: chatbot. Multimedia Tools and Applications, 81, 22215–22246. https://doi.org/10.1007/s11042-021-11458-y
  • Nuruzzaman, M., & Hussain, O. K. (2018). A Survey on Chatbot Implementation in Customer Service Industry through Deep Neural Networks. Proceedings of the 2018 IEEE 15th International Conference on E-Business Engineering (ICEBE 2018), 54–61. https://doi.org/10.1109/ICEBE.2018.00019
  • Paliwal, S., Bharti, V., & Mishra, A. K. (2020). Ai Chatbots: Transforming the Digital World. In Recent Trends and Advances in Artificial Intelligence and Internet of Things (pp. 455–482). Springer. https://doi.org/10.1007/978-3-030-32644-9_34
  • Peyton, K., & Unnikrishnan, S. (2023). A comparison of chatbot platforms with the state-of-the-art sentence BERT for answering online student FAQs. Results in Engineering, 17, 1–6. https://doi.org/10.1016/j.rineng.2022.100856
  • Rahaman, Md. S., Ahsan, M. M. T., Anjum, N., Rahman, Md. M., & Rahman, Md. N. (2023). The AI Race is On! Google’s Bard and OpenAI’s ChatGPT Head to Head: An Opinion Article. Social Science Research Network, 1–6. https://doi.org/10.2139/ssrn.4351785
  • re — Regular expression operations. (2024). Python. https://docs.python.org/3.12/library/re.html
  • Shaji George, A., Hovan George, A., & Martin, Asg. (2023). A Review of ChatGPT AI’s Impact on Several Business Sectors. Partners Universal International Innovation Journal, 1(1), 9–23.
  • Shewale, R. (2023). 62 Chatbot Statistics For 2024 (Usage, Challenges & Trends). Demandsage. https://www.demandsage.com/chatbot-statistics/
  • Smith, N. A., Heilman, M., & Hwa, R. (2008). Question Generation as a Competitive Undergraduate Course Project. Proceedings of the NSF Workshop on the Question Generation Shared Task and Evaluation Challenge, 1–3.
  • Sora. (2024). OpenAI. https://openai.com/sora
  • Sousa, D. N., Brito, M. A., & Argainha, C. (2019). Virtual customer service: Building your chatbot. Proceedings of the 3rd International Conference on Business and Information Management (ICBIM ’19), 174–179. https://doi.org/10.1145/3361785.3361805
  • The pandas development team. (2020). pandas: Python Data Analysis Library. https://pandas.pydata.org
  • Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.-T., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H. S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., … Le, Q. (2022). LaMDA: Language Models for Dialog Applications. ArXiv Preprint, 2201.08239, 1–47. http://arxiv.org/abs/2201.08239
  • Waisberg, E., Ong, J., Masalkhi, M., Zaman, N., Sarker, P., Lee, A. G., & Tavakkoli, A. (2023). Google’s AI chatbot “Bard”: a side- by-side comparison with ChatGPT and its utilization in ophthalmology. Eye (Basingstoke), 1–4. https://doi.org/10.1038/s41433-023- 02760-0
  • Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 1–4. https://doi.org/10.21105/joss.03021

The Battle of Chatbot Giants: An Experimental Comparison of ChatGPT and Bard

Yıl 2024, Cilt: 16 Sayı: 2, 679 - 691, 30.06.2024
https://doi.org/10.29137/umagd.1390083

Öz

Nowadays, it is hard to find a part of human life that Artificial Intelligence (AI) has not been involved in. With the recent advances in AI, the change for chatbots has been an ‘evolution’ instead of a ‘revolution’. AI-powered chatbots have become an integral part of customer services as they are as functional as humans (if not more), and they can provide 24/7 service (unlike humans). There are several publicly available, widely used AI-powered chatbots. So, “Which one is better?” is a question that instinctively comes to mind and needs to shed light on. Motivated by the question, an experimental comparison of two widely used AI-powered chatbots, namely ChatGPT and Bard, was proposed in this study. For a quantitative comparison, (i) a gold standard QA dataset, which comprised 2,390 questions from 109 topics, was used and (ii) a novel answer-scoring algorithm based on cosine similarity was proposed. The covered chatbots were evaluated using the proposed algorithm on the dataset to reveal their (i) generated answer length and (ii) generated answer accuracy. According to the experimental results, (i) Bard generated lengthy answers compared to ChatGPT and (ii) Bard provided answers more similar to the ground truth compared to ChatGPT.

Kaynakça

  • Ali, R., Tang, O. Y., Connolly, I. D., Fridley, J. S., Shin, J. H., Sullivan, P. L. Z., Cielo, D., Oyelese, A. A., Doberstein, C. E., Telfeian, A. E., Gokaslan, Z. L., & Asaad, W. F. (2023). Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank. MedRxiv, 2023.04.06.23288265, 1–23. https://doi.org/10.1101/2023.04.06.23288265
  • Anand, Y., Nussbaum, Z., Duderstadt, B., Schmidt, B., & Mulyar, A. (2024). GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo. GitHub. https://github.com/nomic-ai/gpt4all
  • Ariyaratne, S., Iyengar, K. P., Nischal, N., Chitti Babu, N., & Botchu, R. (2023). A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiology, 52, 1755–1758. https://doi.org/10.1007/s00256-023-04340-5
  • Au Yeung, J., Kraljevic, Z., Luintel, A., Balston, A., Idowu, E., Dobson, R. J., & Teo, J. T. (2023). AI chatbots not yet ready for clinical use. Frontiers in Digital Health, 5, 1–5. https://doi.org/10.3389/fdgth.2023.1161098
  • Bernardini, A. A., Sônego, A. A., & Pozzebon, E. (2018). Chatbots: An Analysis of the State of Art of Literature. Proceedings of the 1st Workshop on Advanced Virtual Environments and Education (WAVE2 2018). https://doi.org/10.5753/wave.2018.1
  • Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit (1st ed.). O’Reilly Media.
  • Caldarini, G., Jaf, S., & McGarry, K. (2022). A Literature Survey of Recent Advances in Chatbots. Information, 13(1), 1–22. https://doi.org/10.3390/info13010041
  • ChatGPT. (2024). OpenAI. https://chat.openai.com
  • ChatGPT, Bard, Microsoft Copilot - Explore - Google Trends. (2024). Google Trends. https://trends.google.com/trends/explore?date=today%203-m&q=/g/11khcfz0y2,/g/11ts49p01g,/g/11tsqm45vd&hl=en
  • Cheong, A. (2024). Python SDK/API for reverse engineered Google Bard. GitHub. https://github.com/acheong08/Bard
  • Cheung, B. H. H., Lau, G. K. K., Wong, G. T. C., Lee, E. Y. P., Kulkarni, D., Seow, C. S., Wong, R., & Co, M. T. H. (2023). ChatGPT versus human in generating medical graduate exam multiple choice questions—A multinational prospective study (Hong Kong S. A.R., Singapore, Ireland, and the United Kingdom). PLoS ONE, 18(8), 1–12. https://doi.org/10.1371/journal.pone.0290691
  • Erica - Virtual Financial Assistant From Bank of America. (2024). Bank of America. https://promotions.bankofamerica.com/digitalbanking/mobilebanking/erica
  • Garfinkle, A. (2023). ChatGPT on track to surpass 100 million users faster than TikTok or Instagram: UBS. Yahoo Finance. https://finance.yahoo.com/news/chatgpt-on-track-to-surpass-100-million-users-faster-than-tiktok-or-instagram-ubs-214423357.html
  • Gemini Team. (2023). Gemini: A Family of Highly Capable Multimodal Models. ArXiv, 2312.11805, 1–62.
  • Google Trends. (2024). Google. https://trends.google.com
  • Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J., & Wu, Y. (2023). How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. ArXiv, 2301.07597, 1–20.
  • Herbold, S., Hautli-Janisz, A., Heuer, U., Kikteva, Z., & Trautsch, A. (2023). A large-scale comparison of human-written versus ChatGPT-generated essays. Scientific Reports, 13, 1–13. https://doi.org/10.1038/s41598-023-45644-9
  • Hristidis, V., Ruggiano, N., Brown, E. L., Ganta, S. R. R., & Stewart, S. (2023). ChatGPT vs Google for Queries Related to Dementia and Other Cognitive Decline: Comparison of Results. Journal of Medical Internet Research, 25, 1–13. https://doi.org/10.2196/48966
  • Hu, K. (2023). ChatGPT sets record for fastest-growing user base - analyst note. Reuters. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/
  • Hulman, A., Dollerup, O. L., Mortensen, J. F., Fenech, M. E., Norman, K., Støvring, H., & Hansen, T. K. (2023). ChatGPT- versus human-generated answers to frequently asked questions about diabetes: A Turing test-inspired survey among employees of a Danish diabetes center. PLoS ONE, 18(8), 1–10. https://doi.org/10.1371/journal.pone.0290773
  • Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science and Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55
  • Introducing UoA Assistant. (2021). University of Auckland. https://www.auckland.ac.nz/en/on-campus/life-on-campus/latest-student- news/student-services-function-review/introducing-uoa-assistant.html
  • Li, S. W., Kemp, M. W., Logan, S. J. S., Dimri, P. S., Singh, N., Mattar, C. N. Z., Dashraath, P., Ramlal, H., Mahyuddin, A. P., Kanayan, S., Carter, S. W. D., Thain, S. P. T., Fee, E. L., Illanes, S. E., Choolani, M. A., Rauff, M., Biswas, A., Low, J. J. H., Ng, J. S., … Lim, M. Y. (2023). ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology. American Journal of Obstetrics and Gynecology, 229(2), 1–12. https://doi.org/10.1016/j.ajog.2023.04.020
  • Liu, J., Tang, X., Li, L., Chen, P., & Liu, Y. (2023). ChatGPT vs. Stack Overflow: An Exploratory Comparison of Programming Assistance Tools. 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C 2023), 364–373. https://doi.org/10.1109/QRS-C60940.2023.00105
  • Microsoft Copilot. (2024). Microsoft. https://copilot.microsoft.com
  • Nirala, K. K., Singh, N. K., & Purani, V. S. (2022). A survey on providing customer and public administration based services using AI: chatbot. Multimedia Tools and Applications, 81, 22215–22246. https://doi.org/10.1007/s11042-021-11458-y
  • Nuruzzaman, M., & Hussain, O. K. (2018). A Survey on Chatbot Implementation in Customer Service Industry through Deep Neural Networks. Proceedings of the 2018 IEEE 15th International Conference on E-Business Engineering (ICEBE 2018), 54–61. https://doi.org/10.1109/ICEBE.2018.00019
  • Paliwal, S., Bharti, V., & Mishra, A. K. (2020). Ai Chatbots: Transforming the Digital World. In Recent Trends and Advances in Artificial Intelligence and Internet of Things (pp. 455–482). Springer. https://doi.org/10.1007/978-3-030-32644-9_34
  • Peyton, K., & Unnikrishnan, S. (2023). A comparison of chatbot platforms with the state-of-the-art sentence BERT for answering online student FAQs. Results in Engineering, 17, 1–6. https://doi.org/10.1016/j.rineng.2022.100856
  • Rahaman, Md. S., Ahsan, M. M. T., Anjum, N., Rahman, Md. M., & Rahman, Md. N. (2023). The AI Race is On! Google’s Bard and OpenAI’s ChatGPT Head to Head: An Opinion Article. Social Science Research Network, 1–6. https://doi.org/10.2139/ssrn.4351785
  • re — Regular expression operations. (2024). Python. https://docs.python.org/3.12/library/re.html
  • Shaji George, A., Hovan George, A., & Martin, Asg. (2023). A Review of ChatGPT AI’s Impact on Several Business Sectors. Partners Universal International Innovation Journal, 1(1), 9–23.
  • Shewale, R. (2023). 62 Chatbot Statistics For 2024 (Usage, Challenges & Trends). Demandsage. https://www.demandsage.com/chatbot-statistics/
  • Smith, N. A., Heilman, M., & Hwa, R. (2008). Question Generation as a Competitive Undergraduate Course Project. Proceedings of the NSF Workshop on the Question Generation Shared Task and Evaluation Challenge, 1–3.
  • Sora. (2024). OpenAI. https://openai.com/sora
  • Sousa, D. N., Brito, M. A., & Argainha, C. (2019). Virtual customer service: Building your chatbot. Proceedings of the 3rd International Conference on Business and Information Management (ICBIM ’19), 174–179. https://doi.org/10.1145/3361785.3361805
  • The pandas development team. (2020). pandas: Python Data Analysis Library. https://pandas.pydata.org
  • Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.-T., Jin, A., Bos, T., Baker, L., Du, Y., Li, Y., Lee, H., Zheng, H. S., Ghafouri, A., Menegali, M., Huang, Y., Krikun, M., Lepikhin, D., Qin, J., … Le, Q. (2022). LaMDA: Language Models for Dialog Applications. ArXiv Preprint, 2201.08239, 1–47. http://arxiv.org/abs/2201.08239
  • Waisberg, E., Ong, J., Masalkhi, M., Zaman, N., Sarker, P., Lee, A. G., & Tavakkoli, A. (2023). Google’s AI chatbot “Bard”: a side- by-side comparison with ChatGPT and its utilization in ophthalmology. Eye (Basingstoke), 1–4. https://doi.org/10.1038/s41433-023- 02760-0
  • Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 1–4. https://doi.org/10.21105/joss.03021
Toplam 40 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Bilgi Sistemleri (Diğer)
Bölüm Makaleler
Yazarlar

Abdullah Talha Kabakuş 0000-0003-2181-4292

İbrahim Dogru 0000-0001-9324-7157

Erken Görünüm Tarihi 30 Haziran 2024
Yayımlanma Tarihi 30 Haziran 2024
Gönderilme Tarihi 13 Kasım 2023
Kabul Tarihi 6 Mart 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 16 Sayı: 2

Kaynak Göster

APA Kabakuş, A. T., & Dogru, İ. (2024). The Battle of Chatbot Giants: An Experimental Comparison of ChatGPT and Bard. International Journal of Engineering Research and Development, 16(2), 679-691. https://doi.org/10.29137/umagd.1390083
Tüm hakları saklıdır. Kırıkkale Üniversitesi, Mühendislik Fakültesi.