extremetech.com

What Is a GPU? AI and Gaming's Most Important Component, Explained

Nvidia GPU

Nvidia GPU

GPUs are crucial to modern computing. You're probably reading this on a screen that's making use of a GPU. But what is a GPU? What are they good for? Join us for a layman's overview.

A graphics processing unit (aka a GPU, graphics card, or video card) is a programmable electronic circuit designed to speed computer graphics and image processing. Virtually every computing device that attaches to a screen has some type of dedicated graphics hardware, and GPUs are often used in "headless" mode with no display attached when deployed for cloud computing or AI processing.

The term "GPU" dates to the launch of the original GeForce 256 in 1999. Before this point, GPUs were referred to as "graphics cards," "video cards," or "3D accelerators" when discussing then-new hardware based on 3dfx, ATI, or previous Nvidia designs like the Riva TNT and TNT2. Nvidia introduced and justified the term by pointing to the chip's hardware transform and lighting (T&L) unit. Competing cards still handled this task in software. Hardware T&L became standard across the industry, and the term "GPU" soon became common shorthand.

GPU's began as a type of ASIC, or Application-Specific Integrated Circuit. As the name implies, an ASIC is an integrated circuit optimized for one specific task. Early GPUs like the GeForce 256 or original ATI Radeon had fixed-function pixel and vertex shaders that were not programmable. This changed over time, beginning with the introduction of programmable pixel and vertex shaders with DirectX 8. The introduction of unified shaders with the G80 in 2006 and Nvidia's CUDA in 2007 kicked off a new era in programmable graphics—one that led directly to the current AI era and the widespread use of GPUs for workloads far beyond the capabilities and curved surfaces of the original GeForce 256.

Discrete vs. Integrated GPUs

Today, GPUs are deployed as discrete cards or integrated into the same physical die as the CPU. The vast majority of GPUs shipped each year are integrated. All smartphones and most computers rely on integrated graphics solutions, though gaming PCs and workstations are a notable exception to this trend. Gaming, workstations, and scientific computing all tend to rely on discrete GPUs, or GPUs with dedicated memory on a separate PCB, which attaches to the computer via the PCI Express bus. Most discrete GPUs use an x16 PCIe card slot, but x8, x4, and even x1 are all possible, and cards have been manufactured to fit these sizes. Larger values indicate higher maximum transfer bandwidth, so only lower-end cards are typically available at x4 or x1.

Discrete GPUs tend to feature roughly the same features as integrated GPUs, provided the two parts belong to the same GPU family. What separates discrete cards from integrated graphics is the size of the GPU and the fact that discrete cards enjoy a dedicated pool of high-speed memory. Integrated GPUs, in contrast, must share the main memory bandwidth with the CPU. AMD APUs like the 8700G have a 65W TDP (shared between CPU and GPU), while discrete graphics cards commonly have TDPs between 150W and 450W. That additional thermal dissipation allows for a much larger array of graphics processors and 4GB to 24GB of memory, depending on the GPU you buy.

Top-down view of the Nvidia RTX 40-90 GPU with its heat sink removed

The RTX 4090 Founder's Edition, with its heat sink removed.

Credit: Nvidia

The image above shows a discrete GPU with its heatsink removed. All dGPU boards include the GPU itself and a dedicated pool of GDDR (Graphics DDR) memory attached in a close array around the GPU. Older cards might include secondary chips, like display controllers, that more modern cards integrate on-chip. Modern dGPUs have used the PCI Express standard since 2003, but older cards used the AGP and PCI graphics standards.

What Makes a GPU Different From a CPU?

GPUs and CPUs are designed for very different types of workloads, to the point that even an untrained observer can see the difference between the two from a simple block diagram. First, here's a CPU:

Block diagram of the Zen 5 CPU

Block diagram of the AMD Zen 5 desktop CPU Credit: Chips and Cheese

CPU cores utilize features like out-of-order execution, branch prediction, large caches, and a sophisticated array of integer and floating-point execution units to maximize single-thread performance. You can think of modern CPU cores as the "accelerator of last resort," meaning they are designed to run code that's hard to parallelize, offload, or otherwise improve. If you've got a big, branchy workload or a lot of unavoidable single-threaded processing, you probably want to run it on a CPU. Modern chips frequently run at 5GHz+ frequencies, and desktops commonly feature 8-24 cores.

Block diagram of the AD102 GPU core

Block diagram of an Nvidia AD102 GPU. Notice how much more repetitive this is?

Credit: Nvidia

GPUs, even integrated GPUs, commonly offer dozens to hundreds of cores. Each individual GPU core is much simpler than its x86 or ARM equivalent. GPUs are designed to execute large workloads across an array of dozens to thousands of relatively simple cores, and the physical architecture of the chip reflects this. What we call a "GPU" is a cluster of compute units designed to execute a workload in parallel across the entire device.

One of the distinctions between a GPU and a CPU is that GPUs tend to be miserable at multi-tasking. Where CPUs are explicitly designed to shift between workloads and to run many applications simultaneously while maintaining a responsive UI, GPUs are not. If you've ever tried to run two graphically intensive games at the same time, you've probably experienced this. Even a high-end GPU that could run either game at silky smooth framerates may tank if asked to run them both at once. GPUs are particularly good at a class of so-called "embarrassingly parallel" applications, of which graphics is one. Embarrassingly parallel applications are easy to optimize and face few restrictions on scaling. They respond well to wider compute engines until they hit some other bottleneck, like available VRAM or the CPU's ability to keep the GPU fed.

The Workload Evolution of the GPU

The GPU is arguably the system component that has evolved the most over the past 30 years. While CPUs have far more cores and run at much higher clock speeds, the latest Ryzen and Core Ultra processors from Intel still leverage some of the same ideas that powered the Pentium Pro. GPUs, as mentioned earlier, have evolved from simple fixed function units, to shader programs, to fully programmable cores capable of running code written in languages like CUDA or OpenCL.

This evolution has been accompanied by an expansion in the types of applications GPUs are good for. Each stage of GPU evolution was accompanied by a surge of interest in using GPUs for new tasks.

Supercomputing began to adopt GPUs around 2009, and the early aughts saw a surge of interest in using GPUs for scientific applications, including oil and gas exploration, molecular dynamics, protein folding, and myriad other applications. Gamers will recall the cryptocurrency boom (likely with little fondness), though custom-built ASICs have largely superseded that use-case.

2018 brought Nvidia's Turing and the arrival of a new type of graphics rendering—ray tracing. Although ExtremeTech gave Turing's ray tracing a less than enthusiastic review at the time, six years of GPU improvements and wider developer adoption have improved the value proposition. Although ray tracing is still too computationally expensive to be heavily deployed, games are beginning to use it more widely. Broader usage of technologies like DLSS and FSR may improve ray tracing adoption by reducing the GPU's overall computational load, though this can come with visual trade-offs of its own. Ray tracing has come a long way, but still seems to be a generation or two from full adoption.

Artificial intelligence is the latest workload to take the GPU market by storm. While it hasn't had the same impact on GPU prices as the cryptocurrency boom or the COVID pandemic, the relentless demand for GPUs to handle AI training and inference workloads has shifted the fundamental financials of the GPU market, possibly forever.

Up until the last few years, gaming was Nvidia's largest single segment. Today, the company's gaming revenue ($3.2B in the most recent quarter) was dwarfed by its data center revenue ($30.7B over the same period). AMD's revenue shows a similar, though much less dramatic, shift toward data center revenue and away from gaming.

If current trends continue, the GPU will have transformed from a gaming-optimized ASIC into an AI accelerator with some rasterization and ray tracing capabilities bolted on. It's a remarkable metamorphosis that speaks to both the importance of graphics processing and the ability of graphics processors to adapt useful processing principles to new workloads and scenarios.

From GLQuake to GEMM, to ChatGPT, GPUs have transformed gaming, scientific computing, and now AI. Across every class of device, from 1W to 1kW, GPUs run GUIs, render video, and serve as the workhorse for an entire host of applications that don't scale well across the CPU. They may not command much cachet outside of the PC market, where Radeon and GeForce are both well-known, but they're a workhorse of modern devices. Call that proof positive that specialization does work. Heinlein can go fly a kite.

Tagged In

Extremetech Explains Nvidia GPUs GPUs AMD GPUs

More from Computing

Read full news in source page