Single-Node Power Demand During AI Training: Measurements on an 8-GPU NVIDIA H100 System

Imran Latif, Alex C. Newkirk, Matthew R. Carbone, Arslan Munir, Yuewei Lin, Jonathan Koomey, Xi Yu, Zhihua Dong

Research output: Contribution to journalArticlepeer-review

Abstract

The expansion of artificial intelligence (AI) applications has driven substantial investment in computational infrastructure, especially by cloud computing providers. Quantifying the energy footprint of this infrastructure requires models parameterized by the power demand of AI hardware during training. In this work, we measured the instantaneous power draw of an 8-GPU NVIDIA H100 HGX node during the training of open-source image classifier (ResNet) and large-language models (Llama2-13b). We characterize power demand for a single node configuration, providing foundational data for future multi-node studies. The maximum observed power draw was approximately 8.4 kW, 18% lower than the manufacturer-rated 10.2 kW, even with GPUs near full utilization. Holding model architecture constant, increasing batch size from 512 to 4096 images for ResNet reduced total training energy consumption by a factor of 4. These findings can inform capacity planning for data center operators and energy use estimates by researchers. Future work will investigate the impact of cooling technology and carbon-aware scheduling on AI workload energy consumption.

Original languageEnglish (US)
Pages (from-to)61740-61747
Number of pages8
JournalIEEE Access
Volume13
DOIs
StatePublished - 2025

Keywords

  • AI training
  • GPU power measurements
  • large language models
  • sustainable computing

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Single-Node Power Demand During AI Training: Measurements on an 8-GPU NVIDIA H100 System'. Together they form a unique fingerprint.

Cite this