A Summary of NVIDIA GPUs - Lance Reinsmith

The NVIDIA GPU Landscape: From Edge to Enterprise

Understanding the modern NVIDIA GPU landscape is essential when building out AI pipelines, spinning up Docker containers for model training, or writing Python scripts for machine learning. The hardware stack ranges from low-power edge devices to massive, liquid-cooled data center racks.

Edge & Embedded: NVIDIA Jetson Orin

The Jetson Orin family represents NVIDIA's edge computing platform, acting as a complete system-on-module (SoM) that includes a CPU, GPU, and memory.

Summary: Built for autonomous machines, robotics, and localized AI inference. It allows developers to run complex, containerized AI models directly on the hardware without needing a constant cloud connection.
Processing Power: Ranges from 34 TOPS (Trillion Operations Per Second) on the Orin Nano up to 275 TOPS on the top-tier AGX Orin.
Relative Power: A massive leap over the older Jetson Nano, capable of natively running smaller transformer models and local LLMs.
Best Use Cases: Robotics, smart cameras, drones, and deploying Python-based computer vision scripts at the edge.
Form Factor: Compact SoM or small desktop developer kits.
Power Consumption: 7W to 60W, depending on the module.
Average Cost: Starts at $249 for the Orin Nano Super and scales up to around $2,000 for the AGX Orin 64GB.

The Consumer Titans: RTX 4090 & RTX 5090

The RTX line is designed for high-end consumer desktops and professional workstations. They dominate both gaming and local AI development (and double as excellent space heaters in the winter).

GeForce RTX 4090

Summary: The Ada Lovelace architecture flagship that redefined consumer performance limits.
Processing Power: ~83 TFLOPS of FP32 compute.
Relative Power: Roughly twice as fast as the previous generation's RTX 3090.
Best Use Cases: 4K gaming, 3D rendering, and local AI fine-tuning using its 24GB of GDDR6X memory.
Form Factor: Massive triple-slot or quad-slot desktop graphics card.
Power Consumption: 450W.
Average Cost: Launched at $1,599 MSRP, though retail prices often hover between $1,800 and $2,000.

GeForce RTX 5090

Summary: The Blackwell-based consumer apex predator, bumping VRAM to 32GB of ultra-fast GDDR7.
Processing Power: ~104 TFLOPS of FP32 compute.
Relative Power: Delivers roughly 30% to 40% higher AI and compute performance than the RTX 4090.
Best Use Cases: Enthusiast 8K gaming, heavy workstation AI training, and running larger local language models without memory bottlenecks.
Form Factor: Dual-slot desktop graphics card (the Founders Edition features a significantly slimmer design than the chunky 4090).
Power Consumption: 575W.
Average Cost: $1,999 MSRP.

Data Center Workhorses: A100, H100, & H200

These are the enterprise accelerators that power modern cloud infrastructure and fueled the global generative AI boom.

NVIDIA A100

Summary: Based on the Ampere architecture, this was the gold standard for enterprise AI before the generative AI explosion necessitated even faster chips.
Processing Power: 312 TFLOPS (FP16) and 19.5 TFLOPS (FP32).
Relative Power: The baseline reference point for modern LLM training, though vastly outpaced by newer models today.
Best Use Cases: Traditional machine learning, data analytics, scientific computing, and cost-effective cloud AI deployment.
Form Factor: Dual-slot PCIe or SXM4 baseboard.
Power Consumption: 250W (PCIe) to 400W (SXM).
Average Cost: $8,000 to $14,000 to purchase new; cloud rentals typically cost $1.00 to $2.50 per hour.

NVIDIA H100

Summary: The Hopper architecture powerhouse that trained models like GPT-4.
Processing Power: 1,979 TFLOPS (FP16), introducing an FP8 precision format capable of 3,958 TFLOPS.
Relative Power: Delivers 3x to 4x faster AI training performance compared to the A100.
Best Use Cases: Hyperscale AI training, massive LLM development, and enterprise cloud compute.
Form Factor: Dual-slot PCIe or SXM5 baseboard.
Power Consumption: 350W (PCIe) to 700W (SXM).
Average Cost: $25,000 to $40,000 to purchase; cloud rentals range from $2.50 to $4.50 per hour.

NVIDIA H200

Summary: A mid-cycle refresh of the H100, featuring the exact same compute cores but paired with a massive upgrade to 141GB of HBM3e memory.
Processing Power: Identical raw compute to the H100 (3,958 TFLOPS FP8), but features an immense 4.8 TB/s of memory bandwidth.
Relative Power: The memory speed upgrade allows it to perform up to 1.9x faster in LLM inference than the H100.
Best Use Cases: High-volume LLM inference, serving massive 70B+ parameter models, and memory-bound scientific simulations.
Form Factor: PCIe or SXM5 baseboard.
Power Consumption: Up to 700W.
Average Cost: $30,000 to $45,000 to purchase; cloud rentals range from $3.50 to $10.00 per hour.

The Next Generation: Blackwell B200 & Grace Blackwell

The Blackwell architecture is NVIDIA's latest enterprise leap, specifically designed to handle trillion-parameter models in real-time.

NVIDIA B200

Summary: The direct successor to the Hopper line, designed to push AI factories into the next era.
Processing Power: 20 PFLOPS of FP4 compute and 10 PFLOPS of FP8.
Relative Power: Boasts up to 15x faster inference and 3x faster training performance compared to the H100.
Best Use Cases: Training trillion-parameter AI models and running Mixture-of-Experts (MoE) architectures at hyperscale.
Form Factor: HGX baseboards (SXM architecture) requiring advanced data center cooling.
Power Consumption: Configurable up to 1,000W or 1,200W per GPU.
Average Cost: Estimated at $40,000 to $50,000 per unit; cloud pricing varies wildly but often starts around $6.00 to $18.00 per hour depending on bundled infrastructure.

NVIDIA GB200 (Grace Blackwell NVL72)

Summary: The GB200 is a "Superchip" that directly connects two Blackwell GPUs with one ARM-based Grace CPU.
Processing Power: An entire GB200 NVL72 rack (combining 72 GPUs and 36 CPUs) acts as one massive logical GPU delivering 720 PFLOPS of FP8 compute.
Relative Power: Can achieve 30x faster real-time inference on trillion-parameter models compared to an equivalent H100 cluster.
Best Use Cases: Building national supercomputers, constructing the largest AI factories, and overcoming traditional GPU-to-CPU communication bottlenecks.
Form Factor: Rack-scale, liquid-cooled data center architecture.
Power Consumption: Massive rack-level power requirements, often exceeding 100kW per rack, mandating liquid cooling.
Average Cost: Single instances can be rented in the cloud for a few dollars an hour, but purchasing full GB200 NVL72 racks runs into the millions of dollars.