AMD Lays The Path To Zettascale Computing: Talks CPU & GPU Performance Plus Efficiency Trends, Next-Gen Chiplet Packaging & More
AMD Lays The Path To Zettascale Computing: Talks CPU & GPU Performance Plus Efficiency Trends, Next-Gen Chiplet Packaging & More

AMD talked about the future of computing, laying out its CPU & GPU trends in terms of efficiency & performance during the ISSCC 2023 conference.
AMD's CEO, Dr. Lisa Su, took the stage and started the talk by highlighting the progress that has been made over the past 10 years. At ISSCC 2013, AMD talked about one of its earliest HSA APUs, Richland, which featured up to 1.3 Billion transistors, 4 cores, 4 threads, a Monolithic 32nm SOI process, and 4 MB of total cache. Fast forward to 2023, and AMD now offers 90 billion transistors, 96 cores, 192 threads on a singular chip with 13 chiplets that utilize 5nm and 6nm process nodes with 386 MB of cache.
That's significant progress that's been made over the past 10 years and based on the performance trends of the last decade, the industry has been improving the mainstream server performance by 2x every 2.4 years. The same is the case for GPUs which have seen the performance increase by 2x every 2 years or so. Now AMD has already become the first tech company in the industry to break past the Exascale barrier with the Frontier supercomputer so the next goal is to reach for the even harder Zettascale mark.
It will take slightly over 10 years to achieve Zettascale given a performance increase of 2x every 1.2 years. That is by taking advantage of all the technology that is available at the moment but when it comes to efficiency, it's not a linear progression-like performance. As per the CPU & GPU efficiency trends, we are starting to see the progress flattening out so while achieving Zettascale performance in the next 10 years or so will be achievable, it will come at a significant efficiency cost.
A Zettascale level system with an efficiency of 2140 GF/Watt is said to consume around 500MW of power using the current architectural prowess that the modern world has to offer. Two of these systems will be requiring an actual Nuclear Power Plant with a capacity of 1000MW or 1 GW. And that's with a 2x efficiency growth every 2.2 years.
So right off the bat, you can tell that something radical needs to happen. Even the logic density is contributing to this & there's also a cost factor to be associated. Making top-tier chips also means that people are going to pay much more than what they paid the last generation. Furthermore, the I/O interconnect has also seen an overall flattening when it comes to energy per bit. Another factor to play in all of this is memory and memory bandwidth. As data sets get larger, there's a massive demand for more capacities and bandwidth which also contribute to a higher power and also, cost.
AMD aims to solve this by using the right compute technology for the right workload. Dr. Lisa Su says that the largest leverage in solving this efficiency crisis has been the use of advanced package technologies used on chips such as the Instinct MI250X and EPYC Genoa. Having chips stacked and packaged together also helps to reduce the relative Bits/Joule cost. So far, advanced packaging has alone provided a 50x reduction in communication power compared to when these chips were all stand-alone and put far to each other across the board.
The next evolution in this journey will come in the form of the AMD Instinct MI300 which has the cache and fabric die at the bottom and CPU / GPU cores 3D Stacked on top of them along with 2.5D integration of memory and the interconnect fabric. The AMD MI300 accelerator also features a next-generation Unified Memory APU architecture which allows the CPU and GPU cores to share the same memory pool of fast HBM memory.
AMD MI250 Accelerator (CDNA 2 Coherent Memory Architecture):
AMD MI300 Accelerator (CDNA 3 Unified Memory APU Architecture):
Chips such as the MI300 will help AMD hyper-accelerate its 30x25 goal which is to deliver a 30x efficiency improvement by 2025. There's more though, AMD talks about future packaging and chiplet architectures which will feature even tighter integration of compute and memory with around 0.2 pj/bit and PIM (Processing-In-Memory) designs which will reduce access energy by up to 85%. AMD also reveals that they are working with DARPA on Optical Communication methods for energy-efficient long reach.
The highlight of the conference came in the form of a top-level block diagram of a future system-in-package architecture which will play a key role in achieving Zettascale performance. The chip AMD showed features advanced packaging to enable maximally efficient integration of compute elements and memory along with a system-level communication accomplished with low-power & high-bandwidth opticals.
High Performance Optical Receivers:
A 0.96pJ/b 7×50Gb/s-per-Fiber WDM Receiver with Stacked 7nm CMOS and 45nm Silicon Photonic Dies - AMD
Pay attention to this work! ????
— Underfox (@Underfox3) February 18, 2023
The next-gen APU architecture features a mix of advanced 2D/2.5D/3D packaging technologies with a range of domain-specific accelerators, heterogenous compute cores, high-speed chip-to-chip interface (UCIe), Co-Package Optics, Memory layers, and more. In total, we can make up 13 chiplets on the top level but there could definitely be even more on the finalized version which we will see in the coming years.
Overall, combining all of these will allow AMD to achieve an impressive 10,000 GFLOPs/Watt Zettascale system in a 100MW capacity which is far less than the 500MW design on existing technologies. When it comes to chiplet and advanced packaging technologies, AMD is no doubt an industry leader and it looks like the company might just become the first to hit the Zetta-scale barrier just like they did with Exascale.
What's Your Reaction?






