New Podcast: Accelerate Science Now with DJ Kleinbaum (Emerald Cloud Lab)

Crash Course: The AI Technology Stack

November 2025

The AI Technology Stack comprises a wide range of complex technologies that together enable AI to work. We’ll be exploring these more in depth in future Crash Courses - stay tuned!

In July 2025, the White House issued an Executive Order on "Promoting the Export of the American AI Technology Stack." The Executive Order highlights a strategic shift – recognizing that AI’s power doesn’t come from a single breakthrough, but from many interdependent technologies working together.

The AI technology stack is constantly evolving, so it should not be thought of as a static set of technologies. Rather, the value of the “stack” idea lies in providing a structured way to think about AI’s interdependent layers. This framing helps policymakers more productively explore how to support American competitiveness in AI. Progress in AI comes from cumulative advances across the technology stack, just as barriers to progress correspond to specific chokepoints in the stack.

What is the AI Technology Stack?

The mechanism for creating and running AI is complex. It requires a layered, interconnected set of technologies that enable machines to process data, learn patterns, and make decisions.

This Crash Course focuses on the American AI technology stack as defined by the July 2025 Executive Order. It includes: AI-optimized computer hardware, data center storage, cloud services, and networking; data pipelines and labeling systems; AI models and systems; security and cybersecurity measures; and AI for specific use cases. Colloquially, these can be thought of as the infrastructure layer; the data layer; the model layer; the security layer; and the application layer.

The Infrastructure Layer

The hardware and infrastructure layer consists of AI-optimized computer hardware, data center storage, cloud services, and networking. These technologies are what enable all other layers of the stack to be developed and operated.

Chips

Most modern AI depends on specialized chips built to handle the enormous computations behind machine learning. The most common are Graphics Processing Units (GPUs) – originally designed for video gaming but now indispensable for training large AI models. Other specialized chips, such as Tensor Processing Units (TPUs) and custom AI accelerators, are architected solely for AI math rather than general computing. These chips can process data faster while consuming far less power. They have become a critical component of the modern stack, deployed by the millions in data centers to make running heavy AI workloads more economically viable.

In AI, these specialized chips typically perform two distinct jobs: training and inference.

Training: This is the computationally massive and energy-intensive process of creating an AI model by feeding it enormous datasets. The goal is to optimize the model's “parameters”—the billions of internal settings that determine its behavior. Adjusting these parameters to minimize errors requires the chips to perform complex matrix calculations on a massive scale. This process, which can take days or weeks, is performed in large-scale data centers using clusters of high-performance GPUs or other specialized AI accelerators.

Inference: This is the "live" process of using the trained model to make a prediction or generate an answer. Inference typically needs to be extremely fast, efficient, and low-latency to be useful. This workload can take place in on-premises or cloud-based data centers, or, as a result of advances in specialized chips and networking technologies, can happen increasingly at the “edge” – directly on connected technologies like phones, cars, factory equipment, and smart infrastructure. This can offer benefits to speed, privacy, and reliability, especially in high-stakes or low-connectivity environments.

Data Center Storage

AI requires a specialized approach to storage that is built for massive scale and speed. AI needs large datasets for training, requiring high-performance, high-throughput storage that can provide data to all the chips simultaneously without bottlenecks.

Cloud Services

While data centers are what make up the cloud, the tools and software that enable interfacing with the cloud are called cloud services. This is the layer that bundles the chips, storage, and networking into an on-demand service. U.S. cloud providers such as Amazon Web Services, Microsoft Azure, and Google Cloud offer "Infrastructure as a Service" (IaaS), allowing any company or researcher to rent access to a supercomputer's worth of power. This allows customers to scale their hardware resources up or down as needed without having to buy and maintain the physical infrastructure themselves.

Networking

Networking technologies enable data to move quickly and reliably throughout the AI stack. It consists of the high-speed connectivity inside the data center that links thousands of AI chips, allowing them to work together as a single massive computer to train one model. This requires ultra-low latency and high throughput, using technologies like InfiniBand or specialized high-speed Ethernet (RoCE).

The Data Layer

The data layer consists of data pipelines and labeling systems, which shape how information is collected, cleaned, and delivered into AI systems – determining what models learn and how well they perform. Even the most advanced models face constraints based on the data they are trained on.

Importantly, AI development and operation can rely on data processing at multiple stages and on a continuous basis. For example, an AI developer could license a base model to an enterprise customer, which then uses a technique called fine-tuning – training it on additional data to improve performance in specific circumstances.

Data Pipelines

Data pipelines are the technologies used to gather, process, and store data. This system must manage the massive volume, velocity, and variety of information required for AI. The process typically involves:

Data Acquisition: Collecting or "ingesting" raw data from its source. This can be public (e.g., government datasets, web scrapes) or proprietary (e.g., a company's internal logs, user content). Synthetic data – artificially generated by other AI models to closely resemble real-world data – can be a valuable resource and can offer other unique benefits, such as potentially being cheaper or more privacy-preserving than relying solely on real-world data.

ETL (Extract, Transform, Load): This is the core function of the pipeline. Software is used to extract data from its source, transform it (by cleaning it, removing errors, and standardizing formats), and load it into a central repository, such as a "data lake" or "data warehouse," where it is organized and accessible for training.

Data Labeling

Data labeling is the process of turning raw data into high-value, usable inputs for AI training.

Labeling and Annotation: This is the foundational process of adding human judgment to raw data. It can be simple labeling (e.g., tagging an image as containing a "car" or "pedestrian") or more complex annotation (e.g., drawing a precise box around the car, transcribing spoken audio, or identifying specific names and organizations in a text document). One technique, called supervised fine-tuning (SFT), involved curating a dataset of labeled input-output pairs of prompts and verified responses. These labeled datasets are the "textbook" from which the AI model studies and learns to recognize patterns. When training a model, only a very small percentage of the training data is typically labeled or annotated.

Model Alignment and Human Feedback: This is a more advanced process used to train complex generative AI and large language models. Instead of just labeling right or wrong answers, this involves human experts "ranking" different AI-generated outputs based on quality, helpfulness, and safety. This feedback is then used in a process called Reinforcement Learning from Human Feedback (RLHF) to "align" the model's behavior with specific desired traits or performance characteristics. RLHF is evolving as AI capabilities increase, with Reinforcement Learning with AI Feedback (RLAIF) representing an increasing share of this work.

The Model Layer

The model layer consists of AI models and systems themselves – core pieces of software that define AI – as well as the tooling – the technologies required to develop and deliver AI.

AI Models and Systems

These have been defined in U.S. law (15 USC 9401 (3)) as:

AI model: A software component of an information system that implements AI technology and uses computational, statistical, or machine-learning techniques to produce outputs from a defined set of inputs.

Underlying the modern generative AI boom are foundation models – AI models trained on large amounts of data capable of serving as the foundation for a wide variety of specialized applications. Large language models (LLMs) are the most common kind of foundation model, but AI models can take many forms, and leverage a variety of techniques, each suited to different kinds of data and tasks.

AI system: Any data system, hardware, tool, or utility that operates, in whole or in part, using AI.

An AI model can be thought of as the main “brain” of a particular piece of software, whereas an AI system packages the model along with other features such that it can support a wider array of users and uses. For example, OpenAI’s GPT-5.1 is the model, whereas ChatGPT is the system, which layers in a user interface, customization settings, editing features, and other tools to make a consumer-friendly application. Other common models and systems include Anthropic's Claude Sonnet 4.5 (model) and Claude Code (system), Google’s Gemini 3.0 (model) Google AI Studio (system), and Meta's Llama 4 (model).

Tooling

Tooling consists of the software and platforms that constitute the "workbench" for building, managing, and deploying AI models. This ecosystem is critical for making model development efficient, reliable, and scalable.

Frameworks & Libraries: These provide pre-written code for common AI operations, allowing researchers to design complex models without having to invent the low-level mathematics each time.

Experiment Tracking: These tools log all the parameters, data versions, and results from every training run, enabling reproducibility and comparison.

Orchestration: Orchestration tools help automate the sequence of tasks involved in AI development, including pulling data, training the model, and deploying it.

MLOps (Machine Learning Operations): Combinations of tooling and automation to manage the complete lifecycle of a model in production. It ensures models are rigorously tested, deployed reliably, and monitored for performance, similar to "DevOps" for traditional software.

Model Serving & Monitoring: Serving tools make a trained model available for applications to use, often via an application programming interface (API). Monitoring tools watch the model's performance in the real world to detect "drift" (a drop in accuracy as new data comes in) and other issues.

The Security Layer

The security layer includes the security and cybersecurity measures that protect the rest of the technology stack. These mechanisms ensure that AI systems, the data that goes into them, and the infrastructure they rely on remain secure from misuse or exploitation.

This layer is unique because it is not a sequential step; rather, it is a set of safeguards that must be integrated across all other layers. It includes traditional cybersecurity measures – like network security, encryption, and access controls – that protect the hardware and data layers from unauthorized access or theft.

This layer also addresses new, AI-specific security challenges that go beyond traditional IT. This means securing the entire AI supply chain, from verifying the integrity of third-party data and pre-trained models to ensuring they haven't been tampered with. It also involves a shift in focus from just preventing breaches to ensuring the robustness and integrity of the model's behavior itself – building systems that are resilient to manipulation and behave reliably. Finally, this layer includes the governance and continuous monitoring required to ensure AI applications are used safely and as intended once they are deployed.

The Application Layer

While all the other layers of the stack are required to build and operate AI, the application layer consists of the myriad different specific applications of AI that generate real-world value.

Many are already familiar with the popular consumer-facing LLM-powered chatbots like ChatGPT and Claude, but the application layer is vast, touching nearly every sector. The following examples are not exhaustive but are illustrative of the breadth of use cases that the complete AI technology stack enables:

Fraud detection: Stripe’s Radar system uses machine learning to assign scores to every transaction, block likely fraud, and adapt to new attack patterns.

Advanced robotics: Standard Bots use 3D vision models and specialized AI models that learn from demonstration to perform generalized tasks from basic demos.

Drug discovery: Google DeepMind’s AlphaFold 3 model predicts biomolecular interactions for proteins and more, potentially dramatically accelerating key steps of drug discovery.

Nuclear energy: Idaho National Laboratory is using AI to develop a “digital twin” virtual model of a small modular reactor to enable advanced modeling.

These use cases show how the "engine" of the model layer is put to work, powered by the hardware and data layers and protected by the security layer, to solve specific, complex problems across the economy and society.

Looking ahead

The AI technology stack is best understood as a dynamic system – one that adapts as innovations in hardware, data management, and modeling push its boundaries. Each layer influences the next, and improvements at any level cascade outwards, reshaping what AI can achieve. For example, a significant advancement in the energy efficiency of GPUs used to train large models could make it cheaper to train larger and more capable AI models. And a shortage of adequate data about chemistry, for example, could limit progress on the development of scientific foundation models that could rapidly accelerate scientific discovery.

Policymakers should use the AI technology stack as a conceptual framework to guide a wide variety of different policy interventions focused on both addressing potential risks and promoting competitiveness in AI. Just as performance changes at different parts of the stack will reverberate outwards, so too will policy changes. Clear regulatory guidance about how autonomous vehicles can operate could spur increased investment in the hardware and infrastructure and networking technologies necessary to accelerate autonomous vehicle development and deployment. And restrictions on how certain kinds of data can be collected or used could influence the quality or availability of specific kinds of AI applications that rely on that data.