Power Consumption in HPC-AI Systems
Abstract
Beside the need for speed that has led to HPC, energy has become a crucial concern that is addressed in HPC hardware and software solutions. There are several aspects when it comes to energy in computing systems: cost, source, heat, carbon, lifetime, and more. From the standpoint of embedded systems, devices are battery-powered, thus the available amount of energy to proceed with is limited. In critical cases like these, it is vital to optimize all sources of substantial power consumption. Regarding more standard computing systems, including supercomputers, the question of energy saving mainly translates into electricity cost and carbon emission. Indeed, the overall energy required to run an HPC infrastructure including cooling systems represents an important part of the maintenance budget. In addition, considering heat dissipation, lifetime of the hardware is reduced as well as MTBF (mean time between failures). One the major topic from the application standpoint that HPC has to consider carefully is artificial intelligence (AI). Indeed, this processing paradigm has tremendously grew up with more and more ambitious perspectives. The pervasiveness of AI solutions and the noticeable computing time required for at least the training phases exacerbate the concern of power consumption in this context. The purpose of this chapter is to explore and illustrate the issue of power consumption in HPC-AI systems so as to make the issue more clear to the reader and highlight the main solutions and perspectives.