Two charts representing how the tested AI chatbots scored in real world scenario. When it came to factual accuracy, ChatGPT-4, the latest from OpenAI, almost scored a passing grade expected of an undergratuate student. It was closely followed by its predecessor, ChatGPT-3.5, Microsoft'sBing and Google's Bard. However, "independent" chatbots Claude 2 and Aria trailed behind with notably lower scores.

ChatGPT vs Bard vs Bing vs Claude 2 vs Aria vs human-expert. How good are AI chatbots at scientific writing?

While large language models have revolutionised content generation, their ability to produce original scientific contributions in the humanities remains limited. We expect this to change in the near future as current large language model-based AI chatbots...