Potent, but only moderately intelligent
How smart is ChatGPT really? Prof. Martin Volk and Prof. Paola Merlo, Researchers of the NCCR Evolving Language, are testing the chatbot and developing their own smart language models that are more efficient, greener, and fairer.
by Roger Nickl.
© Pixabay
Since ChatGPT was launched by the American company Open AI in October last year, the chatbot has dominated the media headlines. And it is used by millions of people worldwide. The AI-based chatbot has amazing capabilities: It simulates human intelligence and can write texts, but also software programs, summarize articles and present complex issues in a simplified way. Pupils use ChatGPT to help with their homework, students learn with it and the language model has also found its way into research. The possibilities of the AI-powered language system seem almost limitless. But how smart is ChatGPT really? And can it already hold a candle to us humans when it comes to writing and research?
A gap between humans and robots
In a current study, Martin Volk deals with the question of how texts written by ChatGPT and those written by humans differ. To do so, the computational linguist, who conducts research at the University of Zurich and at the NCCR Evolving Language, has given the AI-assisted language program specific tasks. Together with his doctoral student Anastassia Shaitarova, he presented ChatGPT with the title and first paragraph of a series of journalistic articles. This with the request to write the next 500 words of the text in German and English. The two researchers then compared the human and machine texts.
“Our hypothesis was that humans, in this case professional journalists, produce more coherent texts and therefore use more discourse particles than a pre-trained language model like ChatGPT,” explains Martin Volk. Accordingly, the researchers focused their analysis specifically on the use of linking particles such as “although”, “thus” or “nevertheless” in the texts. The researchers’ assumption was confirmed: a good ten per cent more connecting particles were used in the journalistic articles than in the computer-generated ones. “This means that although the texts from ChatGPT come across very fluently, they are still clearly different from human-made ones,” says Martin Volk. However the gap is gradually closing.
In constant optimization
According to the latest results of the study, there is now only a difference of a few percent, whereas the researchers were able to determine a clear difference just two and a half months ago. “In the near future, it will no longer be possible to distinguish between texts generated by ChatGPT and those written by humans,” adds Volk, “this applies to most everyday texts for which there are many templates – such as letters of recommendation and letters of application, but also to scientific articles, millions of which are available online.” That’s where the system can learn and improve wonderfully.
This is because ChatGPT’s capabilities are constantly being optimized. On the one hand, thanks to artificial intelligence, the language model can improve itself to a certain extent and learn from “experience” (for example, from the questioning behavior of users); on the other hand, the system is also constantly fed with a large portion of human intelligence in order to improve its performance. ChatGPT may behave smartly to a certain extent, but its great ability is to imitate human intelligence. The developers train the language model particularly intensively in areas where the chatbot lags human texts, such as text coherence. The same applies to general and frequently asked questions, such as: “ChatGPT, do you have consciousness?” Or “Are you intelligent?” “Anyone developing a chatbot will first make sure that these questions are answered properly,” says computational linguist Volk.
Finding a niche
In order to explore the possibilities and limits of the language system and to minimize the corrective influence of the developers, Martin Volk therefore dealt with a niche topic that is hardly in demand and accordingly not on the radar of the operators of ChatGPT. “We wanted to know how the system could handle texts from the 16th century,” says the researcher, “specifically letters by the Zurich reformer Heinrich Bullinger, written in Latin and the German of the early modern period.” The researchers were able to assume that Open AI had not specifically optimized in this area.
The results of this analysis astonished the scientists: ChatGPT translated Latin texts into German and English better than Google Translate, the system automatically recognized the proper names in the Latin letters, linked them to Wikipedia and it was even possible to reconstruct lost letters from Bullinger’s extensive correspondence to some extent. That’s why AI-driven language models could also be interesting for historical research in the future, says Volk, because they could be used to work out missing letter texts on a large scale, something that has so far only been possible in tedious mental micro-work. This could be helpful in the case of a large correspondence such as that of Heinrich Bullinger, where several thousand letters are presumed to have been lost.
The tests Martin Volk does in his research also give him clues as to how AI-based language systems could be further developed and improved in the future. The computer linguist not only analyses existing applications such as ChatGPT, but also develops his own systems. The online translation service TextShuttle, which his start-up company of the same name offers commercially, is based on such a system. “In the future, when we develop new language models, it will also be a matter of finding interesting niches that are socially relevant but of little interest to big players such as Open AI or Google,” says the scientist – for example, applications for historical languages, for small languages or for sign languages.
A need for more sustainable models
The computational linguist Paola Merlo is also working on innovative AI-based language models. “To develop ChatGPT, huge amounts of data were needed, billions of parameters were trained,” says the scientist, who conducts research at the University of Geneva and at the NCCR Evolving Language, “which is incredibly expensive and consumes an enormous amount of electricity.” Therefore, she says, it is important to develop new language models that are smaller, cheaper, but also more energy-efficient and therefore more sustainable and environmentally friendly. Such alternative language systems are being worked on at various universities in the USA, but also in Geneva under the leadership of Paola Merlo.
“The evaluation has shown that large language models like ChatGPT, for example, do not work very well in terms of induction, which means they can hardly abstract or generalize,” says the researcher. ChatGPT generates its texts by probabilistically calculating word by word. The basis for this is big data and enormous computing power. In this sense, ChatGPT is very powerful, but only moderately “intelligent”. The system can process huge amounts of data and write new texts on this basis – but it can hardly derive higher-level rules from the templates. This is exactly what people do, for example when they learn a language – they then connect example sentences for the formation of verbs with grammatical rules. Paola Merlo now wants to teach this ability to smart machines as well. “We are trying to achieve with relatively small data sets that the language system automatically learns rules from examples,” explains the researcher.
Grammar rules for robots
That’s why the computational linguist feeds her program with example sentences from different languages and on various grammatical properties related to the topic of “verb” – for example, causality in English or congruence in French. For this purpose, the researcher has developed specific training tasks inspired by classical intelligence tests. “If we manage to get the system to derive and learn grammatical rules from text templates, we will be able to build powerful language models in the future that require much less data and computing power,” continues Merlo, “because rules are much more compact than an infinitely long series of examples from which ChatGPT, for example, generates its texts – that would not only be more environmentally friendly, but also more elegant.”
Paolo Merlo is still working on the basics of such more efficient language models. In the future, however, such systems could not only enable better quality, cheaper and more sustainable applications, but also those for smaller languages. This is because ChatGPT is and was primarily trained with text data of large languages such as English, German, French or Spanish. Therefore, the chatbot works particularly well in these languages. Quite the opposite of smaller languages, for which there are fewer texts online. “Smaller and smarter models could also enable well-functioning applications for smaller language communities,” states Paola Merlo, “they would thus also be a contribution to more technological justice.”
Computational linguist Paola Merlo is convinced that smart language models can bring many benefits to us all. “If chatbots are smart, secure and trustworthy, they could serve us as personal assistants in the future, for example,” says the researcher. “I would certainly be happy if someone organized flights and hotels for me for a trip – travel agencies that do that for you hardly exist anymore.”