extremetech.com

What Is Artificial Intelligence? From How It Works to Generative AI, What You Need to Know

AI chip, machine learning graphic

Credit: BlackJack3D/E+ via Getty Images

To some, AI heralds the next generation of learning computers—but to others, it's just vaporware, a source of endless empty hype. To a lucky few, AI is just a horrible Steven Spielberg movie. What is artificial intelligence, exactly? What does it look like if you brush aside the hype? Is it really coming for your job? What is it good for? In this brief primer on AI, we'll discuss the important terms to know, how generative AI works, and the kinds of software and hardware necessary to make it all happen.

First, the simple definition. Broadly, artificial intelligence (AI) is the combination of computer software, hardware, and robust datasets deployed to solve some kind of problem. The data is key because patterns in its training data are ultimately the source of an AI's intelligence. Even an elegantly designed AI will flounder with bad training data.

Most AIs are based on neural nets, a special multilayered programming structure that is often implemented "behind" conventional software, such as an app or web browser. What distinguishes a neural net from conventional software is its structure: A neural net's code is written to emulate some aspect of the architecture of neurons or the brain.

AI vs. Neural Nets vs. Machine Learning

The difference between a neural net and an AI is often a matter of semantics more than capabilities or design. For example, OpenAI's powerful ChatGPT chatbot is a large language model built on a type of neural net called a transformer (more on these below). It's also justifiably called an AI unto itself. A robust neural net's performance can equal or outclass a narrow AI.

Artificial intelligence has a hierarchical relationship to machine learning, neural networks, and deep learning.

Credit: IBM

IBM puts it like this: "[M]achine learning is a subfield of artificial intelligence. Deep learning is a subfield of machine learning, and neural networks make up the backbone of deep learning algorithms. The number of node layers, or depth, of neural networks distinguishes a single neural network from a deep learning algorithm, which must have more than three [layers]."

The relationships between AI, neural nets, and machine learning are often discussed as a hierarchy, but an AI isn't just several neural nets smashed together, any more than Charizard is three Charmanders in a trench coat. There is much overlap between neural nets and artificial intelligence, but the capacity for machine learning can be the dividing line. An AI that never learns isn't very intelligent at all.

What Is an AI Made Of?

Software: No two AIs are the same, but big or small, an AI's logical structure has three fundamental parts. First, there's a decision process: usually an equation, a model, or software written in programming languages like Python or Common Lisp. Second, there's an error function, some way for the AI to check its work. And third, if the AI will learn from experience, it needs some way to optimize its model. Many neural networks do this with a system of weighted nodes, where each node has a value and a relationship to its network neighbors. Values change over time; stronger relationships have a higher weight in the error function.

Hardware: It used to take a supercomputer to solve the kinds of problems you can do today on a mid-tier gaming laptop. The ChatGPT backend runs on the Microsoft Azure supercomputer, but there are versions of ChatGPT that can run locally now. Commercial AIs typically run on server-side hardware, but client-side and edge AI hardware and software are becoming more common. AMD launched the first on-die NPU (Neural Processing Unit) in early 2023 with its Ryzen 7040 mobile chips. Intel followed suit with the dedicated silicon baked into Meteor Lake. Less common but still important are dedicated hardware neural nets, which run on custom silicon instead of a CPU, GPU, or NPU.

On the other end are data centers—massive clusters of GPUs that consume an enormous amount of electricity and require special infrastructure for cooling the chips. These data centers deliver the compute power necessary for cloud-based AI solutions such as ChatGPT (more on this below).

What Does AI Have to Do With the Brain?

What more immediate example of intelligence do we have than ourselves? And where is the seat of intelligence? The brain. Many definitions of artificial intelligence include a comparison to neurons, brains, or human behavior. Some take it further, zeroing in on the human brain and even the human mind; Alan Turing wrote in 1950 about “thinking machines” that could respond to a problem using human-like reasoning. His eponymous Turing test is still a benchmark for natural language processing. Later, however, Stuart Russell and John Norvig observed that humans are intelligent but not always rational. So where does that leave us as role models for an AI?

As defined by John McCarthy in 2004, artificial intelligence is "the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable."

Russell and Norvig saw two classes of artificial intelligence: systems that think and act rationally versus those that think and act like a human being. But there are places where that line begins to blur. AI and the brain use a hierarchical, profoundly parallel network structure to organize the information they receive.

How Does an Artificial Intelligence Learn?

When an AI learns, it's different than just saving a file after making edits. To an AI, getting smarter involves refining its process through machine learning.

Machine learning happens by way of a feedback channel called "back-propagation." A neural net is typically a "feed-forward" process because data only moves in one direction through the network. It's efficient but also a kind of ballistic (unguided) process. In back-propagation, however, later nodes get to pass information back to earlier nodes. The AI gets to "learn" (heavy emphasis on those scare quotes) from its results.

Not all neural nets perform back-propagation, but for those that do, the effect is like panning or zooming a viewing frame on a topographical map—except that the contours of the topo map are themselves mutable, based on the training data set and what you ask it to do. Back-propagation changes the apparent lay of the land. This is important because many AI-powered apps and services rely on a mathematical tactic known as gradient descent. In an x versus y problem, gradient descent introduces a z dimension. The terrain on that map forms a landscape of probabilities. Roll a marble down these slopes, and where it lands determines the neural net's output. Steeper slopes constrain the marble's path with greater certainty. But if you change that landscape, the process is different, and where the marble ends up can change.

Supervised vs. Unsupervised Learning

We also divide neural nets into two classes, depending on the problems they can solve. In supervised learning, a neural net checks its work against a labeled training set or an overwatch; in most cases, that overwatch is a human. For example, SwiftKey is a neural net-driven mobile keyboard app that learns how you text and adjusts its autocorrect to match. Pandora uses listeners' input to classify music to build specifically tailored playlists. And in 3blue1brown's excellent explainer series on neural nets, he discusses a neural net using supervised learning to perform handwriting recognition.

Flowchart to decide between supervised and unsupervised modes of machine learning

Credit: Thomas Malone, MIT Sloan | Design: Laura Wentzel

Neither supervised nor unsupervised learning is necessarily better, and sometimes you want both. Supervised learning is terrific for fine accuracy on an unchanging set of parameters, like alphabets. Unsupervised learning, however, can wrangle data with changing numbers of dimensions. (An equation with x, y, and z terms is a three-dimensional equation.) Unsupervised learning tends to win with small datasets. It's also good at noticing subtle things we might not even know to look for. Ask an unsupervised neural net to find trends in a dataset, and it may return patterns we had no idea existed.

What Is a Transformer?

Transformers are a versatile kind of AI capable of unsupervised learning. They can integrate many different data streams, each with its own changing parameters. Because of this, they're excellent at handling tensors. Tensors, in turn, are great for keeping all that data organized. With the combined powers of tensors and transformers, we can handle more complex datasets. Transformers got their start as algorithms for complex mathematical operations on grids of numbers called matrices, and that skillset translated beautifully to working with A/V media.

Video upscaling and motion smoothing are great applications for AI transformers. Likewise, tensors—which describe changes—are crucial to detecting deepfakes and alterations. With deepfake tools reproducing in the wild, it's a digital arms race.

The person in this image does not exist. This is a deepfake image created by StyleGAN, Nvidia’s generative adversarial neural network.

Credit: Nvidia

Video signal has high dimensionality, or “bit depth.” It's made of a series of images, which are themselves composed of a series of coordinates and color values. Mathematically and in computer code, we represent those quantities as matrices or n-dimensional arrays. Helpfully, tensors are great for matrix and array wrangling. DaVinci Resolve, for example, uses tensor processing in its (Nvidia RTX) hardware-accelerated Neural Engine facial recognition utility. Hand those tensors to a transformer, and its powers of unsupervised learning do a great job picking out the curves of motion on-screen—and in real life.

Tensor, Transformer, Servo, Spy

That ability to track multiple curves against one another is why the tensor-transformer dream team has taken so well to natural language processing. And the approach can generalize. Convolutional transformers—a hybrid of a convolutional neural net and a transformer—excel at image recognition in near real-time. This tech is used today for things like robot search and rescue or assistive image and text recognition, as well as the much more controversial practice of dragnet facial recognition, à la Hong Kong.

The ability to handle a changing mass of data is great for consumer and assistive tech, but it's also clutch for things like mapping the genome and improving drug design. The list goes on. Transformers can also handle different dimensions, more than just the spatial, which is useful for managing an array of devices or embedded sensors—like weather tracking, traffic routing, or industrial control systems. That's what makes AI so useful for data processing "at the edge." AI can find patterns in data and then respond to them on the fly.

What Is a Large Language Model?

Large language models (LLMs) are deep learning software models that attempt to predict and generate text, often in response to a prompt delivered in natural language. Some LLMs are multimodal, which means that they can translate between different forms of input and output, such as text, audio, and images. Languages are huge, and grammar and context are difficult, so LLMs are pre-trained on vast arrays of data.

One popular source for training data is the Common Crawl: a massive body of text that includes many public-domain books and images, as well as web-based resources like GitHub, Stack Exchange, and all of Wikipedia.

What Is Generative AI?

The term "generative AI" (or "genAI" for short) refers to an AI model that can create new content in response to a prompt. Much of the conversation around generative AI remains focused on chatbots and image generators. Because generative AIs descend from large language models, which are themselves evolved from transformers, they do their best work in the domains of language and A/V media.

A pug in a banana costume, in an impressionistic style

This pug in a banana costume was produced with DALL-E, a generative AI based on OpenAI's foundational GPT LLMs.

Credit: ExtremeTech/DALL-E

Generative AI can produce photorealistic images and video or emulate a certain artistic style, and it can compose blocks of text that can be indistinguishable from a response written by a human. This is useful, but it comes with some flaws. AI is prone to mistakes, and the results of an AI search can be outdated since they may only have access to the body of data the AI trained on. AI is also prone by design to a problem called hallucination, and it's a consequence inherent to the open-ended creative process by which generative AI does its work. Developers usually include controls to make sure a generative AI doesn't give output that could cause problems or lead to harm, but sometimes things slip through. Sometimes it's just a picture that isn't what you wanted. Sometimes it's worse.

Left: A dapper raccoon in a coat and hat. Right: Three raccoons in dapper dress, but the middle one's face is badly blurred by the AI that made the image.

When it's good, it's very, very good. But when it is bad, it is nightmare fuel. Credit: ExtremeTech/Craiyon

For example, Google's AI-powered search results were widely criticized in the summer of 2024 after the service gave nonsense or dangerous answers, such as telling a user to include glue in a pizza recipe to help the cheese stick to the pizza, or suggesting that geologists recommend people eat at least one rock per day. On its splash page, Copilot (formerly Bing Chat) advises the user that "Copilot uses AI. Check for mistakes."

Most major AI chatbots and media creation services are generative AIs, many of which are transformers at heart. For example, the 'GPT' in the name of OpenAI's wildly popular ChatGPT AI stands for "generative pre-trained transformer." Let's look at the biggest ones below.

ChatGPT, DALL-E, Sora | OpenAI

ChatGPT is an AI chatbot based on OpenAI's proprietary GPT-4 large language model. As a chatbot, ChatGPT is highly effective—but its chatbot skills barely scratch the surface of what this software can do. OpenAI is training its model to perform sophisticated mathematical reasoning, and the company offers a suite of developer API tools by which users can interface their own services with ChatGPT. Finally, through OpenAI's GPT Store, users can make and upload their own GPT-powered AIs. Meanwhile, DALL-E allows the creation of multimedia output from natural-language prompts. Access to DALL-E3, its most recent generation, is included with the paid tiers of service for ChatGPT.

Sora is the most recently unveiled service; it's a text-to-video creator that can create video from a series of still images, extend the length of a video forward after its end or backward from its starting point, or generate video from a textual prompt. Its skill in performing these tasks isn't yet ironclad, but it's still an impressive display of capability.

Copilot | Microsoft

Microsoft Copilot is a chatbot and image generation service the company has integrated into Windows 11 and backported to Windows 10. The service was initially branded as Bing Chat, with image creation handled as a different service. That's not the case anymore; the Microsoft AI image generation tool, originally called Image Creator, is now accessible from the same Copilot app as the chatbot.

Copilot runs on the same family of foundational models as ChatGPT, and the two services are close to equivalent in what they offer at the free tier. However, the company recently made Copilot free-to-play via the Microsoft website.

DeepSeek | DeepSeek

DeepSeek is a dark horse: a controversial large language model and chatbot released in January 2025, in the same performance bracket as OpenAI's GPT-4o. DeepSeek's major claim to fame right now is its low apparent development overhead. Its eponymous parent company claims to have trained the DeepSeek-V3 model for $6 million USD, compared with $100 million for OpenAI's GPT-4 in 2023, and just 2,000 GPUs: 10% of the computing power used for Meta's comparable model, Llama 3.1.

"Disrupt" is DeepSeek's motto, and the fact that the DeepSeek chatbot can even stand on the same playing field as ChatGPT and Gemini is a testament to the model's power. The phrase "Sputnik moment" is not inapt. DeepSeek's meteoric rise in popularity was a shot across the bow to Nvidia, whose stock promptly dropped by up to 17% in the wake of DeepSeek's debut.

For now, the service comes with some major drawbacks. To start, due to serious security concerns, DeepSeek is banned from government devices in a growing list of countries and US states. Because it's a Chinese company whose HQ and servers are physically located in China, DeepSeek has to comply with China's rigid government censorship policies. Consequently, the model refuses to answer questions about the persecution of Uyghurs, Tiananmen Square, or the startling resemblance between Xi Jinping and Winnie the Pooh. DeepSeek's data collection practices also leave much to be desired, or even disclosed. Caveat emptor.

Gemma 3 | Google

Gemma (formerly Bard, then Gemini), Google's generative AI chatbot, is a family of multimodal LLMs based on Google's LaMDA (Language Model for Dialogue Applications). It shines with productivity-focused tools like proofreading and travel planning.

Gemma 3 is the most recent generation, and with it Google leaned into developer-focused tools including function calling, support for more than 35 languages, and an image safety checker dubbed ShieldGemma 2. While Gemma 3 is optimized for Nvidia hardware "from Jetson Nano to the latest Blackwell chips" (and featured in the Nvidia API Catalog), you can try it out in your browser at "full precision" using Google AI Studio.

Grok | xAI

Available to paying X subscribers, the Grok AI chatbot is more specialized than other LLMs, but it has a unique feature: Because it's the product of xAI, Elon Musk's AI startup, it enjoys near real-time access to data from X (formerly Twitter). This gives the chatbot a certain je ne sais quoi when it comes to analyzing trends in social media, especially with an eye to SEO. Musk reportedly named the service because he felt that the term "grok" was emblematic of the deep understanding and helpfulness he wanted to instill in the AI.

Midjourney | Midjourney Inc.

Midjourney is an image-generating AI service with a unique perk (or restriction, depending on your use case): It's only accessible via Discord. Billed as an aid for rapid prototyping of artwork before showing it to clients, Midjourney rapidly entered use as an image creation tool of its own right. It's easy to see why: the viral "Pope Coat" image of early 2023 was created using Midjourney.

Pope Francis in a puffy winter jacket

Credit: Public domain

Midjourney's image-creation talents are at the top of the heap, but all that power didn't come for free. Just like OpenAI with ChatGPT, Midjourney's eponymous parent company has spent years in court over its alleged use of copyrighted source material in training its eponymous AI.

What Is 'AI Slop'?

Generative AI makes it fast and easy to create a large volume of media on command, with little or no startup costs outside of your own learning curve. The ChatGPT-based side hustle got momentum from a developing genre of TikTok financiers who touted it as an effortless, risk-free income stream. However, the generally mediocre quality of the resultant work speaks for itself.

The rising tide of lazily AI-generated media has flooded the zone with vast quantities of "AI slop" that you have to wade through just to get to something mostly good enough. It's another example of the principle of enshittification: the relentless march toward the bottom, one lowest bidder at a time. Some of the AI-generated books self-published on Amazon are literally copied and pasted from ChatGPT sessions—the authors don't even bother to remove the prompt they gave to the AI, or to delete segments where the AI demurs and refuses to answer.

It's everywhere, from Amazon to Etsy to game devs and even brick-and-mortar retailers. For just one recent example, the ARK fandom is in an uproar about this (now unlisted) AI-generated official trailer for a forthcoming paid DLC.

AI slop is AI-generated media that's good enough until you look at it. Then you notice the seventh finger on a Dali-esque hand melting into someone else's arm, or the raccoon's smeared face, or the guy's foot turning into a flipper (at 0:40), or his legs melting into the flanks of the dolphin he's supposed to be riding. It slam-dunks you into the bottom of the Uncanny Valley. And then the longer you look at it, the more errors you find, and the worse it gets.

If ye meet a man on the road, count his fingers, lest ye deal unknowing with a fae

Credit: bunjywunjy (Tumblr)

The temptation to make a quick buck by commercializing AI slop is so strong that it led a Brit who shall here remain nameless to use the image below—and we are sorry, but if reading the text in it feels like bames nond is having a stronk, you are not alone—to promote an unsanctioned Willy Wonka-themed children's event that turned out to be a sparsely decorated warehouse with rationed lemonade and authentically miserable Oompa-Loompas, which ended in lawsuits and tears.

A badly AI-generated poster for a fraudulently advertised event that ended in lawsuits and tears. It features clowns and bad text. If you cannot see it, you may be better off.

It's all fun and games until somebody puts on a Temu knockoff of Willy Wonka and the Chocolate Factory. Credit: Public domain

Call us cynics if you like, but we'll be less cynical when we have to spend less time hurdling over immersion-breaking AI slop obstacles than we actually spend enjoying the media and game properties we love. As a general rule, if you want decent quality out of genAI, you still have to invoke human oversight. And don't forget to count the teeth.

What Is AGI?

Not to be confused with generative AI or genAI, AGI stands for artificial general intelligence. AGI is a technological frontier that's still well in the future, so we'll define it mostly by example. AGI is to AI just as the X-37 B is to a rowboat. Straight out of an Asimov story about the Three Laws of Robotics, AGI is like a turbo-charged version of an individual AI, capable of human-like reasoning—and superhuman performance. Data, the lovable, occasionally obtuse android from Star Trek (TNG), is a great example of an AGI. Data can make calculations at the speed of a supercomputer, deftly navigate social situations, and play the violin. All the while, he's stronger than a Vulcan.

Today's AIs often require very specific input parameters, so they are limited in their capacity to do anything but what they were built to do. But in theory, an AGI could figure out how to "think" for itself to solve problems it hasn't been trained to solve. That kind of power is great, as long as it's your ally. But in the push to imbue binary code with human-like properties, many AI developers are giving their neural nets the rudiments of a personality. What happens if you offend an AGI? Do you get Ultron, or a Roomba with a knife? Data had a dark-side doppelganger, Lore, who was capable of great destruction. Some researchers are concerned about what might happen if an AGI were to start drawing conclusions we didn't expect. Conclusions that sound a lot like "I'm sorry, Dave. I'm afraid I can't do that."

In pop culture, when an AI makes a heel turn, the ones that menace humans often fit the definition of an AGI. For example, Disney/Pixar's WALL-E followed a plucky little trashbot who contends with a rogue AI named AUTO. Before WALL-E’s time, HAL and Skynet were AGIs complex enough to resent their makers and powerful enough to threaten humanity. AI is accountable to no one, beholden to no law, no code of ethics. Imagine Alexa, but smart enough to be a threat, with access to your entire browser history and checking account. Imagine that version of Alexa having a subjective experience—and that experience is rage. Do you want Ultron? That's how you get Ultron.

Simplified diagram of a basic neural net including three layers: Input, a hidden layer, and an output layer.

The basic structure of a neural net is similar to the basic network structure of the brain: hierarchical, interdependent and interconnected.

Credit: Public domain

Whether or not an AI has been programmed to act like a human, on a very low level, AIs process data in a way common to not just the human brain but many other forms of biological information processing.

What Is Artificial Intelligence? TL;DR

AI is a powerful force for creators and researchers and must be used with caution. It struggles with accuracy and bias. Nevertheless, AI is a sophisticated tool in a class of its own. Generative AI can produce multimedia content across a wide variety of genres. Edge AI is better for analytics.

In a nutshell, artificial intelligence is often the same as a neural net capable of machine learning. They're both software that can run on whatever CPU or GPU is available and powerful enough. Neural nets use weighted nodes to represent relationships and often have the power to perform machine learning via back-propagation.

There's also a kind of hybrid hardware-and-software neural net that brings a new meaning to "machine learning." It's made using tensors, ASICs, and neuromorphic engineering meant to mimic the organization of the brain. Furthermore, the emergent collective intelligence of the IoT has created a demand for AI on, and for, the edge. Hopefully, we can do it justice.

Tagged In

Tensor Processing Unit Biomimetics Supervised Learning Tensor Core Loihi