Northwestern Engineering

# **Towards Energy-Proportional Optical Interconnects**

**Nikos Hardavellas**, Northwestern University Yigit Demir, Computational Lithography, Intel

> OPTICS Workshop March 18<sup>th</sup>, 2016

partially supported by NSF award CCF-1453853

### **M**<sup>C</sup>Cormick

Northwestern Engineering

# **Photonics Need High Power Lasers**

- Emergence of photonics
  - High bandwidth, low latency, energy efficient
  - Wide range of apps: manycores, multi-chip, datacenters
- However, lasers are really power-hungry
  - □ Optical devices induce optical loss (13+ dB is typical)
  - □ WDM-compatible lasers are 5-30% efficient
  - → 10-20x higher power than required optical output

2

Northwestern Engineering

## Most of the Laser Power is Wasted

Demir & Hardavellas [HPCA'15] [NOCS'15] [SPIE'15] [IPC'14] [ISLPED'14]

- Interconnect may stay idle for long times
  - Compute-intensive execution phases of workloads
  - □ 30% server utilization in data centers [Barroso 2007]
- But laser stays always on!
  - ...even during periods of interconnect inactivity
- ▶ Up to 94% laser energy waste in real-world workloads

3

© Hardavellas

### **McCormick**

Northwestern Engineering

# **Proposed Solution: Laser Power-Gating**

- Turn the lasers off when interconnect is idle
- Turn the lasers on before sender transmits
  - □ This may be tricky... needs early warning or predictive schemes
- Overlooked until recently
  - □ Traditional comb lasers are slow to turn on
- New enabling technology: Fast on-off switching on-chip lasers
  - □ InP, Ge, ... Turn on/off in 1.5–2 ns
  - □ On-chip → simplify design and lower cost
  - □ [HPCA'15] [SPIE'15] [IPC'14] [ISLPED'14]

4

Northwestern Engineering

# **ProLaser: Energy-Proportional Photonic Nets.**

- Power saving mechanism for photonic interconnects
  - Laser power-gating
    - → Independent power gating for data and control bits
    - → Predicts laser turn-on
    - → Saved power can be used by the cores
- · Result highlights
  - □ Laser energy reduction: 42–88% (61% on avg.)
  - □ Processor energy reduction: 35–52% (40% on avg.)
  - □ Leads to 50–73% speedup (60% on avg.)
  - □ Within 2–6% of the theoretically maximum savings











Northwestern Engineering

# **Laser Control Co-design with Coherence Protocol**

[EcoLaser+, Demir & Hardavellas, SPIE'15]

- Anticipates laser activation
  - Correlates cache coherence requests to replies
  - □ Activates laser early → hides laser turn-on delay
- Which laser / plane to turn on?
  - □ Predict cache miss → turn on requestor's control plane
  - □ Request to directory → turn on directory's control plane
  - □ Directory forwarding → turn on owner's control+data plane
  - □ etc... (including memory controller)
- When to turn it on?
  - □ Turn-on the laser just 1.5ns before the payload is ready
  - Minimum latency for each operation













### **M**<sup>C</sup>Cormick

Northwestern Engineering

## **Conclusion**

- Problem: lasers are really power hungry, mostly wasted power
- Our solution: laser power-gating (ProLaser, SLaC, EcoLaser (+), LaC)
- Significant energy reduction
  - □ Laser: 42–88% (61% on avg.), Processor: 35–52% (40% on avg.)
  - □ Within 2–6% of the theoretically maximum savings
  - □ Power reduction leads to speedups: 50–73% (60% on avg.)
- Applicable to a wide range of scales (on chip, multichip, datacenter)

Thank you! Questions?

18





| M <sup>c</sup> Cormick   |                                                                                                              |
|--------------------------|--------------------------------------------------------------------------------------------------------------|
| Northwestern Engineering |                                                                                                              |
| Experimental Methodology |                                                                                                              |
|                          |                                                                                                              |
| CMP Size                 | 64 cores, 580 mm <sup>2</sup>                                                                                |
| Core                     | ULTRASPARC III ISA, up to 5Ghz, OoO, 4-wide dispatch/retirement, 96-entry ROB                                |
| L1 Cache                 | Split I/D, 64KB 2-way, 2-cycle load-to-use, 2 ports, 64-byte blocks, 32 MSHRs, 16-entry victim cache         |
| L2 Cache                 | 512 KB per core, 16 way, 64-byte blocks, 14 cyclehit, 32 MSHRs, 16-entry victim cache                        |
| Memory<br>Controller     | One per 4 cores, 1 channel per Memory Controller, Round-robin page interleaving                              |
| Main Memory              | Optically connected memory [3], 10ns access                                                                  |
| Network                  | R-SWMR radix-16 crossbar and firefly, 300-bit wide links @ 10GHz, 20 flit deep buffers, 3 cycle router delay |
|                          | 21 © Hardavellas                                                                                             |

#### **M**Cormick Northwestern Engineering **Nanophotonic Parameters** Off-Chip Laser Radix-16 SWMR On-Chip Laser Total per Unit Total DWDM 64 64 Splitter 0.2 dB0.6 dB0.6 *dB* WG Loss 0.3 dB/cm 3 dB3 *dB* Nonlinearity 1 dB1 dB1 dBModulator Ins. 0.5 dB0.5 dB0.5 dBRing Through 0.01~dB10.24~dB10.24 dBFilter Drop 1.2 *dB* 1.2 dB1.2 *dB* Coupler 2 dB4 dB**Total Loss** 16.64 dB 20.64 dB Detector -20 dBm -20 dBm **Laser Power** $0.461 \ mW$ 1.158 mW Per Wavelength Total LaserPower 14.8 W 37.1W 15% Eff. 22 © Hardavellas

Northwestern Engineering

# **Workloads**

• Fmm: Input 128K

• Moldyn: 15, 20, 3.2 M

Barnes: Input 64KTomcatv: 4096, 10

• Appbt: in.24x24x24x8bit

Ocean: 1026, 9600Em3d: 400K, 2, 15, 5

Bodytrack

© Hardavellas



23













