Most of the conversation today around chip design focuses on how we can make chips smaller to pack more punch into one area. On the other end of this conversation is Cerebras Systems, a company that invented the so-called “world’s largest computer chip,” the Wafer Scale Engine (WSE), back in 2019

Cerebras’ second-generation WSE

Cerebras’ second-generation WSE compared to a hockey puck. Image used courtesy of Cerebras Systems
 

The startup’s first WSE came in at about 8 inches by 9 inches and integrated roughly 1.2 trillion transistors into one chip. Now, the company is pushing the limits of this wafer even more with the release of its second-generation WSE—this time more than doubling the transistor count in the same area. 

Second-generation WSE 

Like the first generation WSE, Cerebras’ second-generation WSE is a massive piece of silicon aimed at AI compute applications.

Ad: Get the help you need with your project.

In an area of 46,225 mm2, the Gen2 WSE integrates 2.6 trillion transistors and 850,000 AI-optimized cores. As a reference, the world’s largest GPU has about 54 billion transistors, making the Gen2 WSE over 2.5 trillion transistors larger.

These transistors serve to deliver a huge amount of functionality with the Gen2 offering 40 GB of SRAM, 20 PB/s of memory bandwidth, and 220 Pb/s of fabric bandwidth. Based on these metrics, the Gen2 WSE doubles the performance of the first-generation chip on all shown metrics. 

WSE1 vs WSE2

Gen1 vs. Gen2 WSE. Image used courtesy of Cerebras Systems and HPC Wire
 

Furthermore, Cerebras claims the new chip bypasses GPU competitors with:

  • 123x more compute cores
  • 1,000x more on-chip memory
  • 12,862x more memory bandwidth
  • 45,833x more fabric bandwidth

The increased performance and integration of the WSE-2 is a result of the wafer’s move from a 16nm node in Gen1 to TSMC’s 7nm node for Gen2. Cerebras unique manufacturing process is the key to its high integration and seemingly fast time to market—considering the fact that the company added 1.4 trillion transistors to its chip in two years.

Size of the WSE-2 compared to the largest GPU

Size of the WSE-2 compared to the largest GPU. Image used courtesy of Cerebras Systems
 

According to Cerebras, the WSEs are created by removing the largest possible square from a 300 mm wafer, which results in a 46,000 mm2 chip. It builds chips with a large amount of redundancy, repeatedly building 84 identical tiles to create an array of silicon blocks. 

More Transistors Doesn’t Always Mean Higher Efficiency

Throwing more transistors onto a single piece of silicon may sound like a magic bullet, but it really isn’t; adding more transistors to the chip will only improve performance if all of these transistors get used. The software running on the hardware will need to be optimized to utilize the system as a whole, leaving no transistor dormant.

Otherwise, the increased integration is just an overhead.

Trend in dark silicon

Trend in dark silicon. Image used courtesy of Northwestern University
 

Beyond this, dark silicon is a large concern for highly dense ICs. Will the Gen2 WSE even be able to utilize all of its transistors simultaneously? Or will power and thermal density issues force designers to utilize dark silicon?

Ferrari-level Performance  

“One cannot put a Ferrari engine in a Volkswagen and expect Ferrari performance,” reads a Cerebras whitepaper. “Putting a faster chip in a general-purpose server cannot vastly accelerate a workload on its own—it simply moves the bottleneck.”

Every three and a half months, AI compute demand doubles: from 2015–2020 alone, the compute power required to train the largest deep learning models increased by 300,000 times. To meet these mounting demands, Cerebras Systems purpose-built the massive water to power its CS-2 AI supercomputer, which the company says is the fastest in the industry.

This post was first published on: All About Circuits

Ad: Get the help you need with your project.