# An Approximate Parallel Multiplier with Deterministic Errors for Ultra-High Speed Integrated Optical Circuits

<u>Jun Shiomi</u><sup>1</sup>, Tohru Ishihara<sup>1</sup>, Hidetoshi Onodera<sup>1</sup>, Akihiko Shinya<sup>2</sup>, Masaya Notomi<sup>2</sup>

<sup>1</sup>Graduate School of Informatics, Kyoto University, Japan <sup>2</sup>NTT Nanophotonics Center / NTT Basic Research Laboratories, Japan

#### **Beyond Optical Communication**

Background: Advancement in nanophotonic devices

Enables on-chip interconnects for broadband commutation



Goal: Add **functional unit** to boost up on-chip communication E.g. pattern recognition

### Approximate Parallel Multiplier with Nanophotonic Devices



- ✓ Optical implementation of approximate parallel multiplier
  - For Machine Learning (ML), Approximate Computing (AC)
  - 11% <u>deterministic</u> error at the worst case
- ✓ Enables signal processing with ultra-low latency
  - Example (32-bit parallel multiplier)

    W/o approximation: > 800 ps (< 1.25 GHz)

    This work: 123 ps (= 8.1 GHz)

#### Outline

- Background
- Parallel Multiplier Using Nanophotonic Devices
- Approximate Parallel Multiplier
  - Overview
  - Detailed Implementation
- Performance Evaluation
- Conclusion

### Photonic Crystal-Based Optical Pass-Gate (OPG) [2]





[2] T. Ishihara, et al., International SoC Design Conference, 2016

#### OptoElectoric Conversion Delay



✓ Reducing O/Es on a critical path is a key to ultra-fast operation.

### Issues in Conventional Optical Parallel Multiplier



$$1\tau_{\rm OE} = 25 \,\mathrm{ps} \longrightarrow 32\tau_{\rm OE} = 800 \,\mathrm{ps} \longrightarrow 64\tau_{\rm OE} = 1600 \,\mathrm{ps}$$
 Too slow! (40 GHz) (1.25 GHz) (625 MHz) 7

#### Outline

- Background
- Parallel Multiplier Using Nanophotonic Devices
- Approximate Parallel Multiplier
  - Overview
  - Detailed Implementation
- Performance Evaluation
- Conclusion

### Basic Idea of the Proposed Approximate Multiplier



 $\checkmark$  Three O/E converters on a critical path for any bit width n

### Concept of Approximate Logarithm [3]

$$2^{\log_2 X_n + \log_2 Y_n}$$



✓ Approximate antilogarithm: Inverse function of the logarithm

[3] J. N. Mitchell, IRE Transactions on Electronic Computers, 1962

# Priority Encoder-Based Approximate Logarithm (n = 4)

$$2^{\log_2 X_n + \log_2 Y_n}$$



✓ Only one O/E converter on a critical path for any bit width n

### Barrel Shifter-Based Approximate Antilogarithm (n = 4)

 $2\log_2 X_n + \log_2 Y_n$ 



#### Outline

- Background
- Parallel Multiplier Using Nanophotonic Devices
- Approximate Parallel Multiplier
  - Overview
  - Detailed Implementation
- Performance Evaluation
- Conclusion

## Performance Analysis



✓ Latency = 
$$\frac{3\tau_{OE}}{25 \text{ ps}} + O(n\tau_{OPG})$$

✓ Error: Deterministic [3]

-11% (when 
$$X_n = 3 \cdot 2^i$$
 and  $Y_n = 3 \cdot 2^j$ )  
0% (when  $X_n = 2^i$  or  $Y_n = 2^j$ )  $i, j \in \mathbb{Z}_+$ 



### Performance Comparison

|                                 | Latency                           | Error      |
|---------------------------------|-----------------------------------|------------|
| Conventional (array multiplier) | $> n 	au_{ m OE}$                 | 0          |
| Proposed                        | $3\tau_{OE} + O(n\tau_{\rm OPG})$ | [-11%, 0%] |



Conventional: > 800 ps (< 1.25 GHz)

Proposed: 123 ps (8.1 GHz)

n = 32 n = 326.5 × boost

- ✓ More than 6.5 × faster w/ deterministic errors
  - Application: Machine learning, approximate computing

### Verification Using Optoelectronic Circuit Simulator (n = 4)



#### Conclusion and Future Work

- ✓ Approximate parallel multiplier using optical path-gates
  - Ultra-fast: 8.1 GHz More than 6.5 × faster operation
  - Deterministic error: From -11% to 0%
  - Correct operation confirmed by optoelectronic circuit simulator

- Future work
  - Power & area evaluation
  - Comparison with CMOS-based parallel multiplier