Nowadays, it is hard to find a part of human life that Artificial Intelligence (AI) has not been involved in. With the recent advances in AI, the change for chatbots has been an ‘evolution’ instead of a ‘revolution’. AI-powered chatbots have become an integral part of customer services as they are as functional as humans (if not more), and they can provide 24/7 service (unlike humans). There are several publicly available, widely used AI-powered chatbots. So, “Which one is better?” is a question that instinctively comes to mind and needs to be shed light on. Motivated by the question, an experimental comparison of two widely used AI-powered chatbots, namely ChatGPT and Bard, was proposed in this study. For a quantitative comparison, (i) a gold standard QA dataset, which comprised 2.390 questions from 109 topics, was used, and (ii) a novel answer-scoring algorithm was proposed. The covered chatbots were evaluated using the proposed algorithm on the dataset to reveal their (i) generated answer length, and (ii) generated answer accuracy. According to the experimental results, (i) Bard generated lengthy answers compared to ChatGPT, and (ii) Bard provided answers more similar to the ground truth compared to ChatGPT.
chatbot question answering artificial intelligence ChatGPT Bard geniş dil modeli
Nowadays, it is hard to find a part of human life that Artificial Intelligence (AI) has not been involved in. With the recent advances in AI, the change for chatbots has been an ‘evolution’ instead of a ‘revolution’. AI-powered chatbots have become an integral part of customer services as they are as functional as humans (if not more), and they can provide 24/7 service (unlike humans). There are several publicly available, widely used AI-powered chatbots. So, “Which one is better?” is a question that instinctively comes to mind and needs to shed light on. Motivated by the question, an experimental comparison of two widely used AI-powered chatbots, namely ChatGPT and Bard, was proposed in this study. For a quantitative comparison, (i) a gold standard QA dataset, which comprised 2,390 questions from 109 topics, was used and (ii) a novel answer-scoring algorithm based on cosine similarity was proposed. The covered chatbots were evaluated using the proposed algorithm on the dataset to reveal their (i) generated answer length and (ii) generated answer accuracy. According to the experimental results, (i) Bard generated lengthy answers compared to ChatGPT and (ii) Bard provided answers more similar to the ground truth compared to ChatGPT.
chatbot question answering artificial intelligence ChatGPT Bard large language model
Birincil Dil | İngilizce |
---|---|
Konular | Bilgi Sistemleri (Diğer) |
Bölüm | Makaleler |
Yazarlar | |
Erken Görünüm Tarihi | 30 Haziran 2024 |
Yayımlanma Tarihi | 30 Haziran 2024 |
Gönderilme Tarihi | 13 Kasım 2023 |
Kabul Tarihi | 6 Mart 2024 |
Yayımlandığı Sayı | Yıl 2024 Cilt: 16 Sayı: 2 |