Building the Future: An Overview of GPT-3

Blog
Building the Future: An Overview of GPT-3
Building the Future
Data & AI
Copilot
AI & Machine Learning

GPT-3 Explained

GPT-3, or Generative Pre-trained Transformer 3, is a neural network generative language model from OpenAI first released in early 2020. GPT-3 can generate automated text completion for any prompt up to 2048 words in length.

GPT-3 is an incredibly powerful model known as a transformer neural network. Transformers are a fairly recent innovation for artificial intelligence (see our previous Building the Future: The Promise of AI blog), originally designed by researchers at Google. Before 2017, recurrent neural networks were typically used for tasks like real-time translation. However, they had some significant drawbacks.

Recurrent Neural Networks (RNNs) typically completed translations by working sequentially with only an individual word and a few predecessors at once. While reasonably effective, these models were inefficient and very slow to train, meaning they could only be exposed to relatively small training datasets.

Transformer neural networks solved these problems by introducing Positional Encoding and Attention. With positional encoding, a transformer like GPT-3 treats words as vectors, each having its own unique multi-number identifier and an associated value for its position in a phrase or sentence. Attention then allows the model to consider multiple specific words at once while translating, completing text, or performing some other language function.

Attention is a function of the neural network’s vector weights, or “parameters”, which help the model decide which words are most important in a sentence based on experience with training data. GPT-3 has around 175 billion weights. These weights help the model effectively “understand” words based on the context of their neighbors, and then make decisions about which words to attend most to.

The most capable version of GPT-3 was trained on 300 billion words, or about 45 terabytes of text collected from nearly every corner of the internet. Estimates indicate that it cost OpenAI over $4 million USD to train their model, and the results are incredibly impressive.

In a blind test, participants were given two 200-word samples of news articles, one written by GPT-3, and the other by a human being. Participants identified the article written by the model only 52% percent of the time, slightly more effective than randomly guessing.

Other Abilities

GPT-3 is designed to be a highly proficient language model, but some of its capabilities with basic math are perhaps more impressive. If you were to ask GPT-3 to complete a basic single digit addition problem, it would have no problem giving correct answers. Many single digit problems would likely appear in the model’s training data, so giving the right answer is little more than an act of memorization.

However, OpenAI wanted to test if their model had developed the capacity for machine learning outside of language applications. They fed it 2000 3-digit addition problems, and then 2000 3-digit subtraction problems. Only 17 (0.85%) of the addition problems, and 2 (0.1%) of the subtraction problems appeared at some point in the model’s training data.

Yet somehow, the model managed to complete the 2000 addition problems with >80% accuracy, and the subtraction problems with >90%. Nowhere near 80% of these problems appeared within GPT-3's training data, so the model is not simply memorizing basic arithmetic problems.

Instead, OpenAI’s official report suggests the model has developed some capacity for completing these calculations on its own. In looking through the results, developers noticed an important trend: “the model often makes mistakes such as not carrying a ‘1’, suggesting it is actually attempting to perform the relevant computation rather than memorizing a table”.

Much of the AI community actively cringes upon hearing reports like these. Not because the results are unimpressive, but because this information provides plenty of fuel for those who believe certain AI models are presently capable of sentient thought.

Indeed, GPT-3 is an incredibly powerful system, capable of even completing tasks it was not designed for, but nothing about the model’s performance suggests that it is sentient. Rather, it is incredibly effective at performing functions we typically associate with sentient beings.

The Dura Digital Takeaway

Ultimately, it is perhaps unsurprising that a transformer neural network trained on over 45 terabytes of data is able to write articles almost indistinguishable from those of humans, and developed some method for completing basic math problems along the way. GPT-3's example is at least a testament to the remarkable computing power of contemporary neural network models.

At Dura Digital we continually invest in learning new technologies so that we can provide you, our customers, broad scale insights and awareness that help you transform your business. Contact us for more details on how we can help you advance your business with the latest in artificial intelligence.