The widespread adoption of cloud services today is driving massive data-center growth and requires huge bandwidths. As data centers get larger and more data needs to be moved between servers, interconnect scalability has been identified as a bottleneck to data center growth

This bottleneck is due, in part, to the fact that electrical switch technology is expected to hit a wall in two generations from now (>25.6 Tbps) because of the inability to increase the pin density on the BGA package, according to a Microsoft research team. Along with this, an imminent end to conventional Moore’s Law is threatening the viability of electronic switches. 

Helios: a contemporary proposal for optically-switched networks

Helios: a contemporary proposal for optically-switched networks. Image used courtesy of William M. Mellette et. al

In response, and inspired by the field of telecommunications (think fiber optics), the research community has been looking into optical switches as a solution. 

Why Optical Switches?

In theory, an all-optical data center network looks to use light to both transmit and route all of the data between servers. This differs from conventional electronic networks which use electrical signals to transmit data and electric switches to route data. 

Example breakdown of data center power consumption

Example breakdown of data center power consumption. Image used courtesy of Huigui Rong et. al
 

In contrast to electrical switches, optical switches offer the high bandwidth required for modern-day applications while also being significantly more energy efficient. With a lower power consumption, optical networks will also require less cooling resources. This is a great added benefit as cooling can, in some cases, account for 40% of a data center’s power consumption and significant area

Limitations in Optical Switching

On the surface, optical switches seem like a great alternative to electronic switches. However, there are some significant limitations. One of these notable limitations is that, for proper execution, each server needs to continuously adjust its clock time according to incoming data. This clock and data recovery time typically take microseconds, which completely removes the benefits of nanosecond optical switching. 

UCL and Microsoft’s Breakthrough 

University College London (UCL) and Microsoft have recently announced news of a technique which they claim solves this problem entirely. This new technique, which they call “clock phase caching,” has brought clock recovery time down from milliseconds to under 625 picoseconds. 

The technique works by synchronizing the clocks of all connected servers via optical fiber, and programming memory hardware with the clock phase values. Storing this memory in a special cache (hence clock phase caching) negates the need to recheck clock time. Thus, the time to “recover” the clock is practically eliminated. 

Chart showing the benefits in worst-case throughput overhead from clock phase caching

Chart showing the benefits of worst-case throughput overhead from clock phase caching. Image used courtesy of Kari A. Clark et. al 

Lead researcher Kari Clark says “ “Our research… has the potential to transform communication between computers in the cloud, making key future technologies like the internet of things and artificial intelligence cheaper, faster and consume less power.” 

 

What’s Next for Data Centers?

Up to now, data centers have been able to keep up with rapid growth thanks to Moore’s Law for networking. With the end in sight, researchers are actively searching for new solutions. With this news, it’s apparent that we’re moving in the right direction. While there are still hurdles to overcome, this announcement represents some significant progress toward creating all-optical data centers.


If you’re familiar with electrical-switch-based data centers, what advantages or disadvantages do you see in optical switching? Share your thoughts in the comments below.

Source: All About Circuits