research-article

Open access

Energy Efficient Convolutions with Temporal Arithmetic

Authors:

Timothy SherwoodAuthors Info & Claims

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

Pages 354 - 368

https://doi.org/10.1145/3620665.3640395

Published: 27 April 2024 Publication History

PDF eReader

Abstract

Convolution is an important operation at the heart of many applications, including image processing, object detection, and neural networks. While data movement and coordination operations continue to be important areas for optimization in general-purpose architectures, for computation fused with sensor operation, the underlying multiply-accumulate (MAC) operations dominate power consumption. Non-traditional data encoding has been shown to reduce the energy consumption of this arithmetic, with options including everything from reduced-precision floating point to fully stochastic operation, but all of these approaches start with the assumption that a complete analog-to-digital conversion (ADC) has already been done for each pixel. While analog-to-time converters have been shown to use less energy, arithmetically manipulating temporally encoded signals beyond simple min, max, and delay operations has not previously been possible, meaning operations such as convolution have been out of reach. In this paper we show that arithmetic manipulation of temporally encoded signals is possible, practical to implement, and extremely energy efficient.

The core of this new approach is a negative log transformation of the traditional numeric space into a 'delay space' where scaling (multiplication) becomes delay (addition in time). The challenge lies in dealing with addition and subtraction. We show these operations can also be done directly in this negative log delay space, that the associative and commutative properties still apply to the transformed operations, and that accurate approximations can be built efficiently in hardware using delay elements and basic CMOS logic elements. Furthermore, we show that these operations can be chained together in space or operated recurrently in time. This approach fits naturally into the staged ADC readout inherent to most modern cameras. To evaluate our approach, we develop a software system that automatically transforms traditional convolutions into delay space architectures. The resulting system is used to analyze and balance error from both a new temporal equivalent of quantization and delay element noise, resulting in designs that improve the energy per pixel of each convolution frame by more than 2× compared to a state-of-the-art while improving the energy delay product by four orders of magnitude.

References

[1]

Armin Alaghi and John P Hayes. Survey of stochastic computing. ACM Transactions on Embedded computing systems (TECS), 12(2s):1--19, 2013.

Abstract

References

Cited By

Index Terms

Recommendations

CMOS full-adders for energy-efficient arithmetic applications

From Multi-Valued Current Mode CMOS Circuits to Efficient Voltage Mode CMOS Arithmetic Operators

Energy-Efficient Ternary Arithmetic Logic Unit Design in CNTFET Technology

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations