The Essential AI Glossary: Plain-Language Definitions for Common AI Terms

The Essential AI Glossary: Plain-Language Definitions for Common AI Terms

The Essential AI Glossary: Plain-Language Definitions for Common AI Terms

The fast-evolving world of artificial intelligence is dense with specialized language that can feel impenetrable to outsiders. Researchers and industry insiders rely on a rich lexicon of niche jargon and shorthand to describe their work, which means technical terms frequently slip into coverage of the AI space too. To cut through the confusion, we’ve built this growing glossary of key AI terms you’ll see regularly in our reporting, with clear, plain-language definitions for every entry.

This resource is updated on an ongoing basis as AI researchers advance the frontier of the field, roll out new techniques, and identify emerging safety risks.


AGI (Artificial General Intelligence)

Artificial general intelligence, or AGI, is a famously nebulous concept, but it generally refers to an AI system that outperforms the average human across most (if not all) common tasks. OpenAI CEO Sam Altman recently framed AGI as the “equivalent of a median human that you could hire as a co-worker,” while OpenAI’s official charter defines AGI as “highly autonomous systems that outperform humans at most economically valuable work.” Google DeepMind takes a slightly different approach, defining AGI as “AI that’s at least as capable as humans at most cognitive tasks.”

If that leaves you confused, you’re in good company—even leading AI researchers at the cutting edge of the field don’t agree on a single universal definition.

AI Agent

An AI agent is an AI-powered tool that completes sequences of tasks automatically on a user’s behalf, going far beyond the capabilities of basic AI chatbots. Common use cases include filing expense reports, booking travel or restaurant reservations, and even writing and maintaining full codebases.

That said, AI agents are still an emerging, fast-developing space, so the term can mean different things to different teams, and the underlying infrastructure needed to deliver on all its promised capabilities is still being built out. At its core, though, the concept refers to an autonomous system that can leverage multiple AI tools to complete complex, multi-step workflows without constant user input.

Chain of Thought

For simple everyday questions—like “Which is taller, a giraffe or a cat?”—humans can answer instantly without breaking down the problem step-by-step. But for more complex questions, we often need to work through intermediate steps to get the right answer, which might mean jotting notes down on paper. For example: if a farmer has a mix of chickens and cows, and together the animals have 40 heads and 120 legs, you’d need to work through a simple equation to get the correct result (20 of each animal).

For large language models, chain-of-thought reasoning is the technique of breaking a complex problem down into smaller, sequential intermediate steps to produce a more accurate final result. It generally takes a bit longer to generate an answer this way, but the result is far more likely to be correct, especially for logic or coding problems. Modern reasoning models are built from traditional large language models and optimized for chain-of-thought thinking via reinforcement learning.

(See: Large Language Model)


Meet Your Next Investor or Breakout Startup at TechCrunch Disrupt 2026

Your next funding round. Your next key hire. Your next game-changing opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, high-impact introductions, and market-defining innovation. Register now to save up to $410 on your pass.

Meet Your Next Investor or Breakout Startup at TechCrunch Disrupt 2026

Your next funding round. Your next key hire. Your next game-changing opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, high-impact introductions, and market-defining innovation. Register now to save up to $410 on your pass.


Compute

While “compute” is used in a few different contexts across AI, it generally refers to the raw processing power that AI models need to function. This processing capability is the backbone of the entire AI industry, powering everything from training new large models to deploying them for end users.

The term is also often used as shorthand for the physical hardware that provides this power, including graphics processing units (GPUs), central processing units (CPUs), tensor processing units (TPUs), and other infrastructure that forms the foundation of the modern AI industry.

Deep Learning

Deep learning is a subset of self-improving machine learning that uses algorithms structured as multi-layered artificial neural networks (ANNs). This layered structure lets deep learning systems identify far more complex correlations in data than simpler machine learning approaches like linear models or decision trees. The design of deep learning algorithms is explicitly inspired by the interconnected structure of neurons in the human brain.

Unlike simpler systems, deep learning AI models can identify key patterns and features in input data on their own, without requiring human engineers to manually define these characteristics ahead of time. Their layered structure also lets algorithms learn from mistakes, improving their own outputs over repeated cycles of testing and adjustment.

That said, deep learning systems require massive amounts of training data (usually millions of data points or more) to produce reliable results, and they typically take much longer to train than simpler machine learning algorithms. As a result, development costs for deep learning models are generally much higher.

(See: Neural Network)

Diffusion

Diffusion is the core technology behind most modern AI models that generate art, music, text, and other types of content. The concept is inspired by physics: natural diffusion gradually breaks down the structure of a substance by adding random noise, until the original structure is unrecognizable (for example, when a sugar cube dissolves completely into coffee, you can’t put the cube back together).

AI diffusion models, however, learn to reverse this process: they train to undo the noise they’ve added to training data, ultimately learning how to reconstruct clean, original data from random noise.

Distillation

Model distillation is a knowledge transfer technique that extracts insights from a large “teacher” AI model to train a smaller “student” model. Developers run queries through the larger teacher model and record its outputs, then compare these outputs against ground truth data to measure accuracy. These outputs are then used to train the smaller student model, which learns to replicate the teacher model’s behavior.

The end result is a much smaller, more efficient model that retains most of the teacher’s capabilities with minimal loss of performance. This process is widely believed to be how OpenAI developed GPT-4 Turbo, the faster, more efficient iteration of its GPT-4 model.

While all AI developers use internal distillation to optimize their own models, some companies have also used distillation to replicate the performance of leading frontier models built by competitors. Distilling a competitor’s model via their public API or chat interface almost always violates the provider’s terms of service.

Fine-Tuning

Fine-tuning is the process of running additional training on an existing pre-trained AI model to optimize its performance for a specific task or domain, beyond what it learned during its initial general training. This is usually done by feeding the model new, task-specific training data.

Many AI startups build commercial products by starting with a pre-built general large language model, then boost its utility for a specific industry or task by fine-tuning it using their own domain-specific data and expertise.

(See: Large Language Model [LLM])

GAN (Generative Adversarial Network)

A GAN is a machine learning framework that powers many of the biggest advances in generative AI, especially for creating hyper-realistic content—including deepfake tools. GANs rely on a pair of interconnected neural networks that work against each other: the first network, called the generator, creates new content outputs from its training data and passes those outputs to the second network, called the discriminator. The discriminator’s job is to evaluate the generator’s output and judge whether it is real or artificially created.

The system is structured as a competition (hence “adversarial”): the generator is constantly trying to create outputs that fool the discriminator, while the discriminator is constantly getting better at spotting artificially generated content. This competitive dynamic lets the model produce increasingly realistic outputs over time without extra human input. GANs tend to work best for narrow use cases (like creating realistic photos or videos) rather than general-purpose AI applications.

Hallucination

Hallucination is the standard term the AI industry uses for when an AI model makes up incorrect or fictitious information. It is one of the most pressing quality issues facing modern generative AI.

Hallucinated outputs can be misleading, and even pose real-world risks—for example, a hallucinated response to a medical question could give harmful health advice. That’s why most generative AI tools now include fine-print disclaimers warning users to verify all AI-generated information, even though these warnings are far less visible than the outputs the tool generates.

The tendency of AI to fabricate information is generally tied to gaps in training data. For general-purpose generative AI (also called foundation models), this problem is especially hard to solve: there simply isn’t enough existing data to train an AI to correctly answer every possible question a user could ask. TL;DR: we haven’t invented God (yet).

The prevalence of hallucinations is one of the main drivers behind the growing push for more specialized, vertical AI models focused on a single domain. These domain-specific models require less broad knowledge, reducing the risk of gaps that lead to hallucinations and misinformation.

Inference

Inference is the process of running a trained AI model to generate predictions, answers, or outputs from new input data. Put simply, inference can’t happen before training: a model has to first learn patterns in training data before it can draw useful conclusions from new input.

Inference can run on a huge range of hardware, from smartphone chips to high-end cloud GPUs to custom-built AI accelerators, but not all hardware is equally good at running large models. A very large AI model that generates outputs in milliseconds on a high-end cloud server could take minutes to produce the same result on a consumer laptop.

(See: Training)

Large Language Model (LLM)

Large language models, or LLMs, are the AI models that power all popular modern AI assistants, including ChatGPT, Claude, Google Gemini, Meta Llama, Microsoft Copilot, and Mistral Le Chat. Every time you chat with an AI assistant, you’re interacting directly with an LLM, which processes your request either on its own or with support from add-on tools like web browsers or code interpreters.

It’s worth noting the distinction between the model and the product: for example, GPT is OpenAI’s large language model, while ChatGPT is the consumer AI assistant product built on top of it.

LLMs are deep neural networks made up of billions of numerical parameters (called weights, see entry below) that learn the relationships between words and phrases to build a complex statistical representation of human language. These models are trained by encoding patterns found in billions of books, articles, and text transcripts. When a user inputs a prompt, the LLM generates the most statistically likely sequence of text that fits the prompt: it predicts the most probable next word after the previous one, based on the context of what’s been written so far, and repeats this process until the response is complete.

(See: Neural Network)

Memory Cache

Memory caching is a key optimization technique that speeds up inference (the process of generating an AI response to a user query) by cutting down on redundant calculations. AI relies on massive volumes of complex mathematical calculations, and every calculation consumes energy and processing time.

Caching works by saving the results of frequent or previous calculations to reuse for future queries, eliminating the need to run the same calculation multiple times. One of the most common types of caching used in modern AI is KV (key-value) caching, which is used in transformer-based models. KV caching drastically cuts down on the processing time and computational work needed to generate responses, leading to faster output for users.

(See: Inference)

Neural Network

A neural network is the multi-layered algorithmic structure that forms the foundation of deep learning, and more broadly, the entire generative AI boom driven by modern large language models.

While the idea of designing data processing algorithms after the densely interconnected neuron pathways of the human brain dates all the way back to the 1940s, the rise of powerful GPU hardware (originally developed for the video game industry) is what actually unlocked the potential of neural network design. These chips made it possible to train algorithms with far more layers than was possible in earlier decades, allowing neural network-based AI systems to achieve dramatically better performance across a huge range of use cases, from voice recognition to autonomous navigation to drug discovery.

(See: Large Language Model [LLM])

RAMageddon

RAMageddon is the playful name for a serious industry trend: a growing global shortage of random access memory (RAM) chips, which power nearly every tech product we use daily. As the AI industry has exploded, the world’s largest tech companies and AI labs have been buying up massive volumes of RAM to power their AI data centers, leaving very limited supply for other industries. This supply crunch has driven up prices for remaining RAM chips across the board.

The shortage has impacted everything from consumer gaming (major console makers have had to raise prices because they can’t source enough memory chips) to smartphones (the RAM shortage is projected to cause the biggest drop in global smartphone shipments in over a decade) to general enterprise computing (companies can’t get enough RAM for their own data centers). Price hikes aren’t expected to cool off until the shortage eases, and there’s little sign that will happen anytime soon.

Training

Training is the core process used to develop modern machine learning AI systems. In simple terms, training involves feeding massive volumes of data into a blank model framework, so the model can learn patterns in the data and learn to generate useful outputs.

Before training begins, an AI model’s starting mathematical structure is just a set of layered random numbers—only through training does the model actually take shape. Over the course of training, the model adjusts its internal parameters to better match the desired output, whether that’s identifying images of cats or writing a custom haiku.

It’s important to note that not all AI requires training: simple rules-based AI, like basic linear chatbots programmed to follow manual pre-written instructions, don’t need any training. That said, these systems are usually far more limited in capability than well-trained self-learning AI.

Training is often very expensive, because it requires huge volumes of input data, and the amount of data required for modern large AI models continues to grow. Hybrid approaches, like fine-tuning a rules-based AI with new data, can cut down on development time and costs, requiring less data, compute, energy, and complexity than building a new model from scratch.

(See: Inference)

Tokens

Tokens are the fundamental building blocks of communication between humans and large language models. Humans communicate in natural language, but AI systems process information as discrete segments of numerical data. Tokens are these discrete, model-readable data segments, created through a process called tokenization that breaks down raw input text into distinct units that an LLM can process.

Similar to how a software compiler converts human-written code into binary that a computer can understand, tokenization converts a user’s natural language query into a format an LLM can work with to generate a response.

There are multiple categories of tokens: input tokens (created from a user’s query), output tokens (generated by the LLM as part of its response), and reasoning tokens (used for longer, more complex processing tasks for advanced user requests). For enterprise AI, token usage also directly determines cost: since tokens correspond to the amount of data a model processes, they have become the standard unit AI providers use to price their services. Most AI companies charge for LLM access on a per-token basis, so the more tokens a business uses, the higher their bill will be.

Transfer Learning

Transfer learning is a technique where developers use a fully trained existing AI model as the starting point to build a new model for a different but related task, letting the new model reuse knowledge the original model gained during training.

Transfer learning speeds up model development and cuts costs, and it’s especially useful when the amount of training data available for the new task is limited. That said, the approach has limitations: models built with transfer learning usually need additional training on new task-specific data to achieve good performance in their target domain.

(See: Fine-Tuning)

Weights

Weights are fundamental to AI training: they are numerical parameters that determine how much importance (or weight) the model assigns to different input features in the training data, which directly shapes the model’s final output.

Put another way, weights define which characteristics of a dataset are most relevant for the model’s training task, and they work by applying a multiplier to each input. Model training starts with randomly assigned weights, but as training progresses, the weights are gradually adjusted to help the model produce outputs that more closely match the desired target.

For example, an AI model trained to predict housing prices based on historical real estate data will assign weights to features like number of bedrooms and bathrooms, property type (detached vs. semi-detached), and whether the property includes parking or a garage. The final weights the model assigns to each input reflect how much that feature impacts the final property value, based on the training data.


This article is updated regularly with new terms and definitions as the AI industry evolves.

Related Article