Most of the conversation today around chip design focuses on how we can make chips smaller to pack more punch into one area. On the other end of this conversation is Cerebras Systems, a company that invented the so-called “world’s largest computer chip,” the Wafer Scale Engine (WSE), back in 2019.
Cerebras’ second-generation WSE compared to a hockey puck. Image used courtesy of Cerebras Systems
The startup’s first WSE came in at about 8 inches by 9 inches and integrated roughly 1.2 trillion transistors into one chip. Now, the company is pushing the limits of this wafer even more with the release of its second-generation WSE—this time more than doubling the transistor count in the same area.
Like the first generation WSE, Cerebras’ second-generation WSE is a massive piece of silicon aimed at AI compute applications.
In an area of 46,225 mm2, the Gen2 WSE integrates 2.6 trillion transistors and 850,000 AI-optimized cores. As a reference, the world’s largest GPU has about 54 billion transistors, making the Gen2 WSE over 2.5 trillion transistors larger.
These transistors serve to deliver a huge amount of functionality with the Gen2 offering 40 GB of SRAM, 20 PB/s of memory bandwidth, and 220 Pb/s of fabric bandwidth. Based on these metrics, the Gen2 WSE doubles the performance of the first-generation chip on all shown metrics.
Gen1 vs. Gen2 WSE. Image used courtesy of Cerebras Systems and HPC Wire
Furthermore, Cerebras claims the new chip bypasses GPU competitors with:
- 123x more compute cores
- 1,000x more on-chip memory
- 12,862x more memory bandwidth
- 45,833x more fabric bandwidth
The increased performance and integration of the WSE-2 is a result of the wafer’s move from a 16nm node in Gen1 to TSMC’s 7nm node for Gen2. Cerebras unique manufacturing process is the key to its high integration and seemingly fast time to market—considering the fact that the company added 1.4 trillion transistors to its chip in two years.
Size of the WSE-2 compared to the largest GPU. Image used courtesy of Cerebras Systems
According to Cerebras, the WSEs are created by removing the largest possible square from a 300 mm wafer, which results in a 46,000 mm2 chip. It builds chips with a large amount of redundancy, repeatedly building 84 identical tiles to create an array of silicon blocks.
More Transistors Doesn’t Always Mean Higher Efficiency
Throwing more transistors onto a single piece of silicon may sound like a magic bullet, but it really isn’t; adding more transistors to the chip will only improve performance if all of these transistors get used. The software running on the hardware will need to be optimized to utilize the system as a whole, leaving no transistor dormant.
Otherwise, the increased integration is just an overhead.
Trend in dark silicon. Image used courtesy of Northwestern University
Beyond this, dark silicon is a large concern for highly dense ICs. Will the Gen2 WSE even be able to utilize all of its transistors simultaneously? Or will power and thermal density issues force designers to utilize dark silicon?
“One cannot put a Ferrari engine in a Volkswagen and expect Ferrari performance,” reads a Cerebras whitepaper. “Putting a faster chip in a general-purpose server cannot vastly accelerate a workload on its own—it simply moves the bottleneck.”
Every three and a half months, AI compute demand doubles: from 2015–2020 alone, the compute power required to train the largest deep learning models increased by 300,000 times. To meet these mounting demands, Cerebras Systems purpose-built the massive water to power its CS-2 AI supercomputer, which the company says is the fastest in the industry.