A delidded AMD Instinct M1300X
(Image credit: AMD / sam_naffziger)
Performance efficiency is key to rapid performance increase of AI and HPC processors, so AMD and other companies are fighting for it fiercely with every new product generation. Back in 2021, the company set itself a goal of 2025 to increase the energy efficiency of its EPYC processors and Instinct accelerators by 30 times compared to 2020. It appears that with its latest EPYC 9005-series 'Turin' CPUs and Instinct MI300X GPUs it is close to achieving its goal.
AMD energy efficiency graph
(Image credit: AMD)
To prove its point, AMD used a machine equipped with two 64-cores EPYC 9575F CPUs, eight Instinct MI300X accelerators, 2,304 GB of DDR5 memory and tested its inference performance in Llama3.1-70B (vLLM 0.6.1.post2, TP8 Parallel, FP8, continuous batching) model. Using a complex set of calculations, AMD determined energy efficiency of this system and compared it to an undisclosed machine from 2020, discovering that the new machine is 28.3 times more energy efficient than the old one.
AMD does not disclose specifications of its 2020 system, though we can imagine that it is based on the company's EPYC 7002-series processors featuring the Zen 2 microarchitecture with up to 64 cores per CPU as well as Instinct MI100 accelerators based on the CDNA 1 architecture.
AMD's Instinct MI100 does not support FP8 (unlike MI300X, which supports it at the same rate as INT8), though if we compare INT8 performance of MI100 (184.6 TOPS) and MI300X (2615 TOPS/5230 TOPS with sparsity), the difference will be 14 – 28 times on paper. About the same difference can be observed with FP16, so the comparison is valid. When we factor in dramatically better memory subsystems (32 GB HBM2 at 1.20 GB/s vs 192 GB HBM3 5.30 GB/s) and dramatically better CPUs, it does not come as a surprise that AMD's existing machines are dramatically faster and more performance efficient than its systems from 2020.
AMD itself says that in addition to 'brute force' hardware improvements, its higher performance efficiency was achieved by a combination of architectural advances and software optimizations, which is to be expected.
Just recently the company introduced its Instinct MI325X accelerators based on the CDNA 3 architecture, yet featuring a 288 GB HBM3E memory subsystem. Next year the company is set to roll out its Instinct MI355X processors that will be based on the CDNA 4 architecture and will boost compute FP8 and FP16 performance by about 80% compared to the MI325X. In addition to FP8 and FP16, the MI325X will add support for FP4 and FP6 formats for AI, which will increase its peak performance to 9.2 PetaFLOPS (FP4), something that will be useful for many large language models. That said, AMD is more than on track to achieve a 30 times higher energy efficiency of its compute platforms by 2025 when compared to 2025.
"With our thoughtful approach to hardware and software co-design, we are confident in our roadmap to exceed the 30x25 goal and excited about the possibilities ahead, where we see a path to massive energy efficiency improvements within the next couple of years," wrote Sam Naffziger, Senior Vice President, AMD Corporate Fellow, and Product Technology Architect at AMD.
Anton Shilov
Contributing Writer
Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.