ChatGPT vs Bard vs Bing vs Claude 2 vs Aria vs human-expert. How good are AI chatbots at scientific writing?
While large language models have revolutionised content generation, their ability to produce original scientific contributions in the humanities remains limited. We expect this to change in the near future as current large language model-based AI chatbots evolve into large language model-powered software.
Historically, the mastery of writing was seen as a cornerstone of human progress. However, the advent of advanced generative AI has ushered in a transformative era, one that has profound implications for scientific discourse.
Edisa Lozić and Benjamin Štular took a closer look at how AI chatbots fare when tasked with scholarly writing in the humanities and archaeology. The study assessed six leading AI chatbots on two main criteria: the factual accuracy of their content, similar to grading students, and the quality and originality of their scientific contributions, similar to peer reviewing a scientific paper.
When it came to factual accuracy, ChatGPT-4, the latest from OpenAI, almost scored a passing grade expected of an undergratuate student. It was closely followed by its predecessor, ChatGPT-3.5, Microsoft'sBing and Google's Bard. However, "independent" chatbots Claude 2 and Aria trailed behind with notably lower scores.
But here's the twist: while these chatbots were skilled at piecing together existing knowledge, they all fell short in producing truly original scientific content. This finding sheds light on the intricate nature of human research and the unique processes we use to convert raw data into new knowledge. As of now, it appears that this talent for originality remains a distinctly human trait.
However, the authors assume that development in the field of AI will shift towards the development of large language models-powered software in the near future. In fact, the technology already exists for a tool that could collect and analyse raw data, interpret it, describe it and publish it, just as researchers do.
So a future in which so-called general artificial intelligence generates original scientific contributions is not far away. However, understanding our world, a fundamental aspiration of the humanities, will only be partially achieved through the use of black-box AI. Since the humanities, like justice, are as much about processes as outcomes, humanities scholars are unlikely to settle for uninterpretable AI-generated predictions. The search for human-interpretable understanding is thus likely to be the remaining task for human researchers in the humanities over the next decade and beyond.
This research was part of the AI4Europe project (Horizon Europe research and innovation programme, Grant Agreement n◦ 101070000), and the Slovenian Research and Innovation Agency (ARIS) grant number P6-0064. The full article is available at https://doi.org/10.3390/fi15100336.