extremetech.com

How Does ChatGPT Work? OpenAI's Groundbreaking Chatbot, Explained

OpenAI logo

Credit: René Ramos; OpenAI

How do you use ChatGPT? What can it do? This is our layperson's guide to what ChatGPT is, how it works, and how to make it work for you. No prior expertise necessary. In the first part of this article we'll answer questions like these:

What does "ChatGPT" mean?

What kind of tasks can ChatGPT perform?

What are its strong points? What does it excel at?

How does ChatGPT work "under the hood?"

In the second part, after your brief crash course on how ChatGPT works, you'll find an overview of different ways to access ChatGPT and how to use it once you've got access.

Before we dive in, here's the simple definition: ChatGPT is an artificial intelligence chatbot based on OpenAI's foundational GPT-4 large language model. It parses the user's natural-language prompts or multimedia input and generates a relevant response, informed by the data used to train the AI.

What Does the 'GPT' Stand For?

ChatGPT's nature is in its name, so let's break that down. The GPT in ChatGPT's name stands for generative pre-trained transformer. A generative AI is a type of multimodal AI system that generates text, images, or other media in response to prompts from the user. A transformer is a neural net: a software model capable of deep learning that can begin to discern what parts of its input are essential. Transformers take a given data set, perform an operation on it, and then return the result.

OpenAI used a semi-supervised approach to pre-train the GPT models that power ChatGPT. In the first unsupervised stage, ChatGPT's programmers loosed the model on their training data sets, prompting it to form its own assumptions about the structure of languages. (Unsupervised learning is critical to ChatGPT's flexibility because it lets the model assess the data it's received.)

Flowchart of ChatGPT's training and optimization using different prompts: "Explain reinforcement learning to a 6 year old," and "Write a story about otters."

PPO (Proximal Policy Optimization) is the official title for the fine-tuning AI helping ChatGPT sanity-check and refine its results.

Credit: OpenAI

The final step in ChatGPT's pre-training is a fine-tuning pass by another AI. This serves as a sanity check, correcting ChatGPT's broad-brush assumptions using filtered, structured, and labeled data that was content-moderated by humans. All this happens before the chatbot goes live, patches and hotfixes notwithstanding.

What Can ChatGPT Do?

Broadly speaking: As a generative, multimodal AI, ChatGPT specializes in generating content in response to user prompts. Ask it questions and it will respond. ChatGPT has knowledge up to December 2023, but it can search the web to answer queries that need more up-to-date source material.

ChatGPT's forte is creative writing. ChatGPT and other models of this type can produce blocks of text, create images and video, and even write music. Mobile users can take advantage of the microphone in their phone or tablet, using verbal queries or speech-to-text. ChatGPT Plus subscribers can upload images to the service to inform its process and output. Unlimited access to OpenAI's DALL-E image creator is also included in the paid ChatGPT service tiers.

A glass of cherry limeade with lime slices, on a table at a diner

This glass of cherry limeade was made using DALL-E.

Credit: OpenAI/DALL-E/ExtremeTech

Specifically, ChatGPT makes a solid research gofer, and it's an excellent copyeditor for any text you may wish it to proofread. It can evaluate text for accuracy, offer outlines and ways to rephrase, and summarize web articles and peer-reviewed papers. It can draw up an outline for a topic and then produce its own blocks of text, although results are better when you work through a piece of writing methodically, going step by step instead of asking it to write the whole thing in one response. PCMag, our sister site, also noted back in 2023 that ChatGPT can be a good coding assistant, and in 2024 rated ChatGPT's language translation capabilities highly compared with other services such as Bing or Google Translate.

Because ChatGPT is based on a large language model, it does its work in the domain of language. What it knows about language informs the content it generates. This means grammar matters when you're asking a question or entering a prompt.

An alchemist's workbench full of colorful potions, with a black cat wearing a blue collar sitting on top being helpful. The images in a stained-glass style.

An alchemist's workbench full of colorful potions, with a black cat wearing a blue collar sitting on top being helpful. The images in a stained-glass style.

Somehow even an AI knows how to convey a certain je-ne-sais-quoi about cats. Call that CatGPT. Credit: OpenAI/DALL-E/ExtremeTech

Once you learn its internal rules, the service is straightforward, and there's a surprising wealth of different ways to use it that are "off the beaten path." If you struggle with writer's block or get stuck staring at a blank canvas, ChatGPT can help you figure out where to start. We've seen success stories from folks who used ChatGPT to help them plan meals and vacations, host trivia games, write code and Excel formulas, and even beat depression. And it looks like the GPT model family's powers are only increasing. OpenAI has been using their GPT 4.0 model to do mathematical reasoning with natural language inputs, training the model on questions like "Simplify 'tan 100 + 4(sin 100).'"

Mathematical reasoning with ChatGPT

Mathematical reasoning with ChatGPT

Credit: OpenAI

OpenAI offers a fair number of third-party upgrades for (paid) ChatGPT Plus users. There are now more than a hundred extensions for ChatGPT from providers like Instacart, OpenTable, and Zillow. You can install as many as you like, but you can only have three enabled at a time.

Still more powerful are the APIs by which developers can interface their projects with OpenAI's GPT3/3.5/4+ backend. These tools aren't free; developers pay per query, although it's a tiny fraction of a cent per thousand input tokens. However, OpenAI's various ChatGPT APIs can interface with external APIs, allowing external services to perform sophisticated queries and function calls.

Finally, OpenAI's GPT Store hosts millions of custom GPTs, which free and paid users can browse and use. These special-purpose mini models allow users to apply the full power of GPT-3+ to their own specific tasks. If you've ever wanted to build your own AI, you can use ChatGPT to do it.

How Does ChatGPT Work?

ChatGPT's backend software is written in Python, and it runs on the Microsoft Azure supercomputer powered by thousands of Nvidia A100 and H100 GPUs. The software is only about 500 gigabytes on disk, but it requires hundreds of gigabytes of VRAM, and it trained on terabytes of data.

ChatGPT's unique powers come from its roots as a transformer. Before tools like ChatGPT began to use natural language, they were already terrific at algorithmically transforming and upscaling images and video. Images can be represented as an array of pixels, where each pixel has its associated values within the colorspace—and videos are a series of images, sometimes with audio waveforms attached. You can give a transformer an image and tell it, "Do X operation to every pixel," and what it returns can be a drastic improvement on the source material. Transformers can even analyze and change the motion of elements in a video, sussing out the movement vectors based on which pixels change between frames.

💡 Before we go any further, you'll want to know what a token is if you didn't already: A token, in machine learning, is a subordinate element of a sentence, phrase, clause, paragraph, or other input form (like elements of an image, or motifs within a piece of music). The path (sequence of words or tokens) a transformer takes to get to its output is informed by how much attention its model thinks it should pay to various tokens and areas of the map.

Tokens, in this example text, represented as a multidimensional vector

Credit: OpenAI

According to OpenAI, there are about 1,000 tokens in 750 words, and ChatGPT's single-input limit is 128,000 tokens.

Now for the math: Everything ChatGPT does is ultimately powered by calculus, and structured by simple Boolean logic.

Under the hood, these models use vectors (which—remember?—have both magnitude and direction) to navigate language as a kind of conceptual landscape, where attributes of language represent the Cartesian coordinate system. The relative importance of each token informs the topographical height of features in this semantic landscape. A sentence, or a group of tokens, has a net vector. The model chooses its path through that landscape, working in terms of vectors, using an algorithm called gradient descent.

Suppose you put a marble on an uneven surface; it would roll along the path of least resistance to find its lowest-energy resting place. Gradient descent picks a place on that landscape and puts down a metaphorical marble. The AI's output, be it text or image or multimedia, is analogous to the marble's path.

To produce its fluent answers, ChatGPT has to ask itself, "What comes next in this text string?" To solve the problem with gradient descent, ChatGPT picks a starting point on the semantic landscape, a point that corresponds to the language attributes of the starting query. The model decides on the desirability of its options based on which place on the semantic map has the greatest probability of coming next in the model's process. Then, it takes the highest-value path to where it thinks the user wants to go.

Transformers can manage what they pay attention to—in effect, they can cordon off parts of the map, depending on what they are programmed to do. More important tokens get preferential attention, and the model decides which tokens are important by looking at a comparator set of natural-language exchanges between humans. For ChatGPT, one important set is the Common Crawl, which indexes a vast number of websites, including Wikipedia, Reddit, StackExchange, and GitHub. Another is the official Ubuntu help forums, which contain more than a million language exchanges between human beings.

A red fox wearing spectacles, sitting on a book on a writer's desk, with a feather pen and a bottle of ink.

A red fox wearing spectacles, sitting on a book on a writer's desk, with a feather pen and a bottle of ink.

Here is your obligatory fox in payment for the math. Credit: OpenAI/DALL-E/ExtremeTech

How to Access ChatGPT

Now that you know how ChatGPT works, you can find the vanilla ChatGPT tool at chatgpt.com. Access is free, but the most advanced features require a paid account. Both require you to create an account. The free tier offers users "unlimited messages, interactions, and history" with its GPT-4o ("o" for "Omni") model. Right now, the paid personal subscription starts at $20 per month, and for that outlay, you get access to ChatGPT-4, with additional tools for browsing, media generation, and data analysis. ChatGPT Plus subscribers can also use OpenAI's suite of tools to create their own purpose-built GPTs, although per OpenAI's terms of use, commercial use of those GPTs requires a commercial or enterprise-level subscription.

Splash screen showing the capabilities of ChatGPT-4

Credit: Emiliano Vittoriosi/Unsplash

OpenAI has released an official ChatGPT app for Android and iOS, available to both free users and ChatGPT Plus subscribers. It offers the same features as the website, with an added perk: OpenAI's open-source Whisper speech recognition tool allows mobile users to speak their queries rather than typing them out. macOS desktop users also get access to the ChatGPT coding tool with the free tier.

Microsoft also has several tools that offer access to ChatGPT's powers via the GPT-4+ LLMs. Microsoft has poured billions of dollars into ChatGPT, and in 2024 the company integrated OpenAI's GPT-4o model into Copilot (formerly Bing Chat). Several "front-end" services are powered by the GPT-3+ models, including DALL-E2 and DALL-E3, the latter behind Bing Image Creator.

One caveat: ChatGPT can only handle so many users simultaneously. More than 100 million people have signed up for the service, most of whom are in the US. This means that, like with game servers, peak use periods correspond to the times people are most likely to use the service. If too many try to engage with ChatGPT simultaneously, queue times go up, and its servers can overload.

How to Use ChatGPT

The hardest thing about using ChatGPT might just be figuring out what you want to do with it. Once you've decided on your direction, it's time to start. To engage with ChatGPT, you need to give the AI a prompt. Here's what the interface looks like:

Screencap of the user interface from vanilla ChatGPT

Credit: OpenAI

You don't need to know anything about programming to use ChatGPT; you can write your prompt in regular conversational language, or even copy and paste a block of text up to 25,000 words long. Some simple examples:

I read that you need different canning techniques to put up tomato salsa versus fruit preserves or meat. Is that true?

In three paragraphs, summarize the reasons for the Federal Reserve's last three interest rate adjustments.

In the context of Java programming, can you explain what a switch case is? Is it different than a switch statement?

Evaluate the text I'm about to paste for accuracy and grammar, and suggest ways I can reduce jargon.

Give me an outline for an 800-word blog post that gives an overview of n-type versus p-type semiconductors.

What's the difference between n-type and p-type semiconductors? Explain it like I'm a beginner.

Draw a young woman wearing Victorian clothing in a ballroom with skylights.

A curvaceous female with curly blonde hair, depicted in Victorian English clothing in a ballroom with skylights.

On the whole, results are good, but some errors with her eyes and lips end up plunking the viewer into the Uncanny Valley.

Credit: OpenAI/DALL-E/ExtremeTech

You can refine your results by changing the prompt or telling the AI to refer to a given resource, such as .gov domains, StackExchange, or Wikipedia. More detail in your prompt helps the AI provide a better response. Trial and error is your friend here, and again, grammar matters. Nudging your prompt for better results is called prompt engineering.

Here's a quick example of the creative process: We asked Bing Image Generator (powered by DALL-E) to generate an image of a red fox sitting happily in a pile of autumn leaves.

LOOK AT HIM, HE'S SO GOOD

Those leaves look so crunchy. Credit: Bing Image Generator/ExtremeTech

The fox is good, but the leaves are pretty rough. At a glance, it doesn't detract from the image, but it could be better. So we changed our query and asked the chatbot for "an image of a red fox sitting happily in a pile of autumn leaves. Pay special attention to making the leaves realistic." The result:

Another happy fox in a pile of maple leaves

We didn't specify "maple" leaves; the AI just decided it would depict them.

Credit: Bing Image Generator/ExtremeTech

For those who want to delve more deeply into artistic experimentation with GPT-type neural nets, OpenAI also built DALL-E 3, an 'AI art generator' based on ChatGPT. DALL-E 3 can parse natural-language descriptions and use them to create art of many styles, from abstract to clip art to photorealistic.

A happy dog in a field in front of a barn

A Very Good Dog in a field of wheat. Credit: OpenAI/DALL-E/ExtremeTech

What Are the Drawbacks of Using ChatGPT?

OpenAI bills its GPT-4 model as capable of humanlike performance, able to "see, hear and speak," but clothes don't make a man. ChatGPT is a powerful, sophisticated tool, but for all its polish it relies on generative AI, which is still in its early days. AI has no sense of context and can only do what it's programmed to do, which starts to show when people test its limits.

One of the biggest issues with ChatGPT and all similar programs is called hallucination. The ability to hallucinate is a double-edged sword: hallucination is key to ChatGPT's abilities, but it's also a critical weakness. When an AI hallucinates, it produces output with a known form but mismatched content. Generative AIs, like ChatGPT, use this ability to respond to prompts, creating novel content that's essentially a sophisticated, blenderized remix of what the AI has seen as it trains and learns.

The problem is that ChatGPT isn't always as wise as it is powerful. AI hallucinating detail can change what characters look like or distort videos and images in unusual ways. While ChatGPT can tell jokes, it can't make up its own; research has shown it tends to return to just a couple dozen, only making simple variations on the theme. More seriously, a recipe chatbot based on GPT 3.5 recently made headlines by suggesting delights such as a "Poison Bread Sandwich," "Thermite Salad," "Bleach-Infused Rice Surprise" and an "Aromatic water mix" containing bleach, water and ammonia. ChatGPT-based tools such as GitHub Copilot can write a functional computer program—even one that includes backdoors or hidden malicious code.

ChatGPT also struggles with accuracy. Any service or tool that scrapes the web is vulnerable to misinformation, and ChatGPT is no exception, although GPT-4 is less likely to confabulate than GPT-3. For example, in September 2023, a deprecated model of GPT-3 confidently told Quora that eggs can melt. (They can't. You can thaw an egg, but you can't melt one.) Quora promptly banned replies composed by ChatGPT, because of its problems with accuracy. AIs powered by large language models, like ChatGPT and its kin, have an annoying tendency to create textual output that is grammatically coherent but factually wrong.

And then there are the legal problems: Concerns related to copyrighted material being ingested into image and video tools have created headaches across the legal landscape. "Fair use" isn't necessarily the same for commercial use as it is for personal or academic purposes. Getty (of Getty Images) and The New York Times have filed suit against OpenAI for copyright infringement, alleging that OpenAI trained ChatGPT on their work and is profiting from their respective intellectual property. Major scientific journals, such as Science and Nature, have banned or sharply restricted AI-generated content.

In practice, this all means that when it comes to the heavy-duty stuff, ChatGPT isn't ready to have the training wheels off quite yet. But the number of services powered by ChatGPT in its various incarnations will only grow over the next few years. Microsoft intends to build AI into Windows through multiple services and applications, including products like CoPilot and its Bing web browser. As AI tools gain traction and popularity, we'll move along the hype curve, and the technology will find its place. Until then—we'll be here to demystify ChatGPT and other new technologies, one how-to at a time.

P.S. If you stuck with us this far, enjoy one last complementary fox. We saved the best for last. 🦊

A red fox in a forest clearing under a starry sky, with fireflies in the background, in a stained-glass style.

A red fox in a forest clearing under a starry sky, with fireflies in the background, in a stained-glass style.

Credit: OpenAI/DALL-E/ExtremeTech

How did we do with this explainer? Did we miss a question you wanted to ask? We do read the comments, so please leave your feedback below. Thanks for reading!

Tagged In

Generative AI Extremetech Explains ChatGPT Artificial Intelligence

More from Computing

Read full news in source page