



# A System-Level Perspective on Silicon Photonic Network-on-Chips

Workshop on Optical/Photonic Interconnects for Computing Systems, 2019

Aditya Narayan, Ayse K. Coskun Boston University



# On-chip communication challenges in manycore systems

Due to technology scaling & higher computation needs, more resources are integrated on-chip





Frequency (MHz)

Number of **Logical Cores** 

2020



Year

2000

2010



1980

1990



10<sup>7</sup>

10<sup>1</sup>

10<sup>0</sup>

1970

# Photonic Network-on-chips (PNoCs)

#### **Benefits with PNoCs**

- Wavelength-division multiplexing
  - → Higher bandwidth
- Lower data-dependent energy consumption
  - → 0.42pJ/bit for modulation, 0.18pJ/bit for drivers [Zheng et al. Optics Express'11]
- Lower latencies with silicon waveguides
  - → Data rate density of 320Gbps/µm [Batten et al. Micro'09]

#### **Challenges in PNoCs**

- High sensitivity to thermal and process variations
  - → Higher thermal tuning power
- ➤ High PNoC power with more laser wavelengths
- Higher optical loss and lower laser efficiency
  - → Increased laser source power

# System-level optimizations enable efficient PNoC integration

#### **Energy-efficient computing with PNoCs**

System-level optimization by cross-layer modeling of device, design and architecture parameters



## Manycore systems with PNoCs

#### Monolithic-integrated PNoC



#### 2.5D-integrated PNoC



#### Outline<sup>®</sup>



#### **Design optimization**

Floorplan optimization for PNoCs [DATE'16]

#### Workload allocation

FreqAlign- Thread allocation and migration [TCAD'17]

### Wavelength selection

WAVES- Minimal laser wavelength selection [DATE'19]

### Floorplan optimization for PNoCs

# PNoC floorplan optimization flow that is aware of on-chip thermal variations based on various power profiles



# Floorplan optimization formulation



- System is formed by tiles
- PNoC is represented by
  - → clusters of tiles
  - → location of router groups
  - → the waveguides

#### **Minimize:** $\alpha \cdot P_{PNoC} + \beta \cdot AREA_{PNoC}$

#### Subject to:

$$\sum_{r \in R, q \in Q, f \in \{0,1\}} \gamma_{frq}^{c} = 1, \qquad \forall c \in C, \quad \gamma_{frq}^{c} \in \{0,1\}$$

$$r_c = \sum_{r \in R, q \in Q, f \in 0, 1} r \cdot \gamma_{frq}^c, \quad q_c = \sum_{r \in R, q \in Q, f \in 0, 1} q \cdot \gamma_{frq}^c, \quad \forall c \in C$$
Router group related constraints

$$f_c = \sum_{r \in R, q \in Q, f \in 0, 1} f \cdot \gamma_{frq}^c, \quad \forall c \in C$$

$$\begin{split} o_{crq} &= \sum_{r' \in R, q' \in Q, f \in 0, 1} o_{fr'q'}(r, q) \gamma^{\,c}_{fr'q'}, \quad \forall c \in C \\ \sum_{c \in C} o_{crq} &\leq 1, \qquad \forall q \in Q, r \in R \end{split}$$
 Tile and cluster related constraints

$$2v_{rq}^{n} - e_{hrq-1}^{n} - e_{vr-1q}^{n} - e_{hrq}^{n} - e_{vrq}^{n} - \sum_{f \in 0,1} \gamma_{frq}^{s_{n}} - \sum_{f \in 0,1} \gamma_{frq}^{t_{n}} = 0,$$

$$\forall n \in N, r \in R, q \in Q.$$

Path related constraints

# Optimization flow



# Cross-layer PNoC PnR optimization

#### Optimized PNoC layouts for different power profiles and laser wall-plug efficiency



#### Outline

### **Design optimization**

Floorplan optimization for PNoCs [DATE'16]



#### Workload allocation

FreqAlign- Thread allocation and migration [TCAD'17]

### Wavelength selection

WAVES- Minimal laser wavelength selection [DATE'19]

### Target manycore system

256-core system with Clos network

Core Architecture: IA-32 core in Intel SCC [Howard, ISSCC2011], 16KB I/D L1 cache & 256KB L2 cache;

Average power consumption: 1.166W



# Workload allocation to mitigate MRR resonance shifts

- ➤ The resonant frequency shifts because of process variations (PV) i.e., device variability, geometric aberrations
- The resonant frequency shifts because of temperature variations (TV)



#### System-level goal

- ➤ Minimize the difference among MRR temperatures
- Reduce the overall chip temperature
- ➤ Minimize the impact of process variations on MRR resonance shift
- On-chip laser sources' optical frequencies also need to match with corresponding MRR's resonant frequency

### Runtime thermal management policies



Effectively reduces the RG temperature gradient, which results in a low resonant frequency gradient

#### FreqAlign



- ➤ Create an *M x N* weight matrix
  - Steady state temperature impact per unit of power of core j on RG i
  - Update the weight matrix with impact of process variations

### Frequency tuning techniques

#### **Target frequency tuning (TFT)**

All RGs and laser sources are first tuned to their optical frequencies at the temperature threshold of the target manycore system (90°C)

#### Limitations

All RGs and lasers have to be tuned

System underutilized





#### Adaptive frequency tuning (AFT)

Set the lowest frequency among the RGs as the target frequency and tune all the other devices to this target frequency

→ Target frequency is adaptive



# Experimental results



- Compared to RingAware, FreqAlign reduces the resonant frequency difference by 60.6% on average
- Compared to RingAware+ TFT, FreqAlign+ AFT reduces the tuning power by 14.93W on average

#### Outline

### **Design optimization**

Floorplan optimization for PNoCs [DATE'16]

#### Workload allocation

FreqAlign- Thread allocation and migration [TCAD'17]



### Wavelength selection

WAVES- Minimal laser wavelength selection [DATE'19]

# Challenges in designing energy-efficient PNoCs

- ➤ Increased PNoC power for higher aggregate PNoC bandwidth
   → 1.5pJ per bit for 600Gbps line rate [Bahadori et al. DATE'17]
- High sensitivity of optical devices to thermal variations (TV) and process variations (PV)







PNoC power increases with the increasing laser wavelengths in the system

The resonant wavelength of MRRs shifts due to TV and PV

## Wavelength selection (WAVES)



- Identify λ<sub>min</sub> for an application to provide minimal performance loss
   Set performance loss threshold (L<sub>thr</sub>)
- $\triangleright$  Activate best combination of  $\lambda_{min}$  accounting for the on-chip TV and PV
- Cross-layer simulation framework to model the system performance and PNoC power
  - → Explore device-level MRR locking under different system-level constraints

### POPSTAR: 2.5D manycore system with PNoCs

#### POPSTAR → Processors On Photonic Silicon inTerposer ARchitecture



- ▶ 96-core 2.5D manycore system
- Off-chip laser emits up to 6 wavelengths
  - → Data rate of 12Gbps
  - → Peak aggregate bandwidth of 576Gbps

### PNoC power-bandwidth tradeoff





- $\triangleright$  PNoC power increases with number of activated laser wavelengths ( $\lambda_{act}$ )
- System performance saturates at a  $\lambda_{min} < \lambda_{tot}$ , which is dependent on application's bandwidth requirement

### Experimental results



## System-level optimization → essential for PNoCs

Our work aims at *reducing the PNoC power (thermal tuning, EOE and laser source power)* via workload allocation, thermal tuning policies, wavelength selection and design-time techniques.

# Design time techniques



PNoC power and area optimized layout for a given design inputs

# Workload allocation and tuning policies



#### Wavelength selection



#### **Contributors**



Aditya Narayan, Dr. Tiansheng Zhang, Yenai Ma, Furkan Eris, Prof. Ayse Coskun, Prof. Ajay Joshi



Yvain Thonnart Dr. Pascal Vivet



Prof. Andrew Kahng Vaishnav Srinivas