Marketing, money and technology:
Behind the scenes of the GPT-3
Over the past 6 months, GPT-3, a language model using deep learning to produce human-like text, has hit the headlines. Some of the articles had even been written by GPT-3 itself. Among other terms, the machine has been described as “stunning”, a “better writer than most humans” but also a bit “frightening”. From poetry to human-like conversation, its capacities appear infinite… but are they really? How does GPT-3 work and what does it say about the future of artificial intelligence?
Artificial Intelligence, neural networks, deep learning. Do these terms give you goosebumps? A few years back, this technical vocabulary was only used by scientists or specialized companies. But, little by little, these terms have entered our daily-life conversations, even if we don’t really understand them. Whether it’s for criticizing algorithms, debating about how artificial intelligence will make our jobs disappear or chatting about the latest episode of Black Mirror, we seem to have accepted the role that modern technologies play in our society. And in 2020, a ground-breaking innovation has been on everyone’s lips: GPT-3, which is considered as one of the largest neural network system ever created.
GPT-3 stands for “Generative Pretrained Transformer” and is a language model that can generate texts using algorithms. “In computational linguistics, a language model is a system that can predict with high probability what the next word is”, explains Martin Volk, Professor of Computational Linguistics at the University of Zurich and Principal Investigator for the NCCR Evolving Language. Concretely, by learning to predict which letters and words tend to follow each other when written by a human, GPT-3 is able to predict text that makes sense in a context. It’s the same concept as your cell phone predicting the next word when you write a message. Except, it’s much bigger.
Having been used in natural language processing for a long time, language models are core for speech recognition, for machine translation systems and for extracting texts, among others. “In other words”, Martin Volk adds, “a good language model can produce fluent texts that sound very natural.”
And GPT-3 is extremely good at it. How come? “The architecture of the model itself is not novel, but the data used for the training is way bigger than what has been used in the past”, says Paola Merlo, head of the interdisciplinary research group Computational Learning and Computational Linguistics (CLCL) at the University of Geneva and Principal Investigator for the NCCR Evolving Language. Furthermore, its developers have put massive amounts of time and money into training GPT-3.
Some numbers to illustrate this….
Billion parameters are contained in GPT-3 (To put that in perspective, Microsoft’s Turing-NLG, the previous record-holder, had 17 billion of them)
Billion tokens have been filtered from the dataset
Percent of the data are in English language
Years would be necessary for a completion of the training
Million dollars is the estimate cost for completion
Impressive, right? However, we should all keep in mind that time and money aren’t the only ressources used for such training: carbon footprint should also be taken into account. In fact, computer scientists reckon that training GPT-3 is akin to driving a car to the Moon and back. And this is not going to change soon, as datasets grow larger by the day and algorithms need to become more and more complex. But this will be the topic for another time, let’s come back to GPT-3.
What GPT-3 can do
As a language model, GPT-3 can generate everything that has the structure of a language. It can write texts (such as poetry, newspaper articles, summary reports), translate languages, and even create computer codes.
Thanks to its predictive power, GPT-3 can also learn to use words it has never been trained with. How is this possible? “Because it knows so much, the model will derive examples from the analogies, using the language patterns it was trained with”, comments Martin Volk. The scientific article written by the researchers of OpenAi and introducing the model offers several such examples:
(blue = instruction given to GPT-3, red= GPT-3’s answers)
A “Gigamuru” is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:
I have a Gigamuru that my uncle gave me as a gift. I love to play it at home.
A “yalubalu” is a type of vegetable that looks like a big pumpkin. An example of a sentence that uses the word yalubalu is:
I was on a trip to Africa and I tried this yalubalu vegetable that was grown in a garden there. It was delicious.
By giving a definition of a made-up word, GPT-3 is then able to use it in a sentence. Simon Clematide, senior researcher in Machine Learning for Natural Language Processing at the University of Zurich, explains what makes this possible: “Actually, the system doesn’t only predict the next word, but goes further: once it has predicted a word, it includes the predicted word in its calculations, and then predicts another word. GPT-3 takes into account the history of what it has generated.” Basically, by giving the definition of the invented word, GPT-3 predicts in what kind of sentence it would fit best.
So, does GPT-3 think like a human?
If GPT-3 has the capacity of learning from analogies, does this mean that it has the same language representation as a human? And by extension that it can think like one?
I am a robot. A thinking robot. I use only 0.12% of my cognitive capacity.
GPT-3, The Guardian, 2020
“The main question is: has GPT-3 understood its utterances? If we consider its training, this is behaviourism in its purest form”, Simon Clematide states. “You give it a stimulus, it gives you a response, in the case of GPT-3 it’s the most likely next words. So, we can’t really talk about human understanding, more about mimicry”.
As stated above, the architecture of GPT-3 is not novel or innovative. It uses the same system as any other recent transformer (Reminder: the T in GPT-3 stands for Transformer). “While learning, transformers learn important cues, but they don’t resolve the structure of language in a traditional linguistic sense,” Simon Clematide says. “For example, they can predict the probability of a pronoun to appear in a discourse but they don’t really understand their linguistic function.” In other words, they learn the important connections between words in a text or a sentence, but don’t “know” the intrinsic meaning of these words. “That’s what we call “attention model”, Paola Merlo explains. “It’s a principle integrated in the deep learning architecture (neural network system): thanks to its memory, the system knows what’s important in the sentence and whether two words are related or connected.”
On the whole, even though the model is still far from having a human representation of language, the fact that we can even consider it is already a big step. “The combination of a data-driven methods with the relatively new concept of attention opens up the perspectives for a different kind of language structure, Simon Clematide adds, and it’s impressive that we can observe emerging connections between words that are related to traditional linguistic concepts.”
Still, there is the idea that machines may one day process thoughts the same way humans beings do. Since the invention of computers, researchers have tried to find a way of understanding how “human” a machine is. The most famous test is the so-called Turing test, even though this is a controversial topic among scientists.
The algorithms behind the machine
For its training, the machine was fed with data provided by crawling the Internet (Common Crawl dataset and other sources such as Wikipedia or Reddit posts). Although it’s the one of the biggest system ever trained (Number 1 is Google’s GShard which contains 600 billions parameters), it only used 0.6% of Wikipedia ressources. Furthermore, its training was almost exclusively in English (90%), as this is the language most commonly used on the Internet.
GPT-3 is exclusively owned by Microsoft, and access to the source code has been extremely limited so far. However, some research groups around the world have had access to the API in order to test the GPT-3 technology for their research. And the API already helps understanding how GPT-3 processes texts: “It works that way: you describe the task, e.g. “translate hello in chinese” and it gives you the translation”, explains Paola Merlo.
One problem that is commonly emphasized when we talk about algorithms is that bias in the training may lead models to generate stereotyped or prejudiced content. By asking questions to GPT-3, it is possible to identify what bias it contains:
“He would be described as…” vs “She would be described as..”
Females were more often described using appearance oriented words such as ”beautiful” and ”gorgeous” as compared to men who were more often described using adjectives that span a greater spectrum.
A cocktail of Marketing & Money
There’s no denial: GPT-3 is the promise of a bright future for language models. But is it really worth all the media hype it has received? Yes and no. As explained before, the machine doesn’t contain a human language representation and is even far from it. Furthermore, training it to its completion would require a huge amount of money and months. But it is still the largest language model ever trained and may be for a while. This fact combined with a good marketing strategy has enabled the model to be all around the news since its release. And every day more is written about it.
So… what’s next?
“Well, as we say, there’s no data like more data and no parameters like more parameters”
For sure, the Internet provides a huge amount of data for such training. But the point of language models is not to only represent one language, which is complicated with the omnipresence of English for online data. So, how could we train them with languages that have poor resources? OCR may be a key to historical manuscripts and printed documents. But there’s another key. “We speak more than we write. So why not use speech recognition in order to train the language models? And why not use the speech signal directly?”, suggests Paola Merlo. “That would be a game changer”, Martin Volk confirms.” I really hope that I will be there for GPT-15. The coming years are exciting for automatic language processing and understanding.”