Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/MICRO.2012.48acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Neural Acceleration for General-Purpose Approximate Programs

Published: 01 December 2012 Publication History

Abstract

This paper describes a learning-based approach to the acceleration of approximate programs. We describe the \emph{Parrot transformation}, a program transformation that selects and trains a neural network to mimic a region of imperative code. After the learning phase, the compiler replaces the original code with an invocation of a low-power accelerator called a \emph{neural processing unit} (NPU). The NPU is tightly coupled to the processor pipeline to accelerate small code regions. Since neural networks produce inherently approximate results, we define a programming model that allows programmers to identify approximable code regions -- code that can produce imprecise but acceptable results. Offloading approximable code regions to NPUs is faster and more energy efficient than executing the original code. For a set of diverse applications, NPU acceleration provides whole-application speedup of 2.3x and energy savings of 3.0x on average with quality loss of at most 9.6%.

References

[1]
C. Alvarez, J. Corbal, and M. Valero, "Fuzzy memoization for floating-point multimedia applications," IEEE Trans. Comput., vol. 54, no. 7, 2005.
[2]
W. Baek and T. M. Chilimbi, "Green: A framework for supporting energy-conscious programming using controlled approximation," in PLDI, 2010.
[3]
B. E. Boser, E. Säckinger, J. Bromley, Y. Lecun, L. D. Jackel, and S. Member, "An analog neural network processor with programmable topology," J. Solid-State Circuits, vol. 26, pp. 2017-2025, 1991.
[4]
L. N. Chakrapani, B. E. S. Akgul, S. Cheemalavagu, P. Korkmaz, K. V. Palem, and B. Seshasayee, "Ultra-efficient (embedded) SOC architectures based on probabilistic CMOS (PCMOS) technology," in DATE, 2006.
[5]
T. Chen, Y. Chen, M. Duranton, Q. Guo, A. Hashmi, M. Lipasti, A. Nere, S. Qiu, M. Sebag, and O. Temam, "Benchnn: On the broad potential application scope of hardware neural network accelerators?" in IISWC, Nov. 2012.
[6]
N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner, "Application-specific processing on a general-purpose core via transparent instruction set customization," in MICRO, 2004.
[7]
M. de Kruijf and K. Sankaralingam, "Exploring the synergy of emerging workloads and silicon reliability trends," in SELSE, 2009.
[8]
M. de Kruijf, S. Nomura, and K. Sankaralingam, "Relax: An architectural framework for software recovery of hardware faults," in ISCA, 2010.
[9]
H. Esmaeilzadeh, P. Saeedi, B. Araabi, C. Lucas, and S. Fakhraie, "Neural network stream processing core (NnSP) for embedded systems," in ISCAS, 2006.
[10]
H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger, "Dark silicon and the end of multicore scaling," in ISCA, 2011.
[11]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Architecture support for disciplined approximate programming," in ASPLOS, 2012.
[12]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Towards neural acceleration for general-purpose approximate computing," in WEED, Jun. 2012.
[13]
K. Fan, M. Kudlur, G. Dasika, and S. Mahlke, "Bridging the computation gap between programmable processors and hardwired accelerators," in HPCA, 2009.
[14]
Y. Fang, H. Li, and X. Li, "A fault criticality evaluation framework of digital systems for error tolerant video applications," in ATS, 2011.
[15]
FANN, "Fast artificial neural network library," 2012. Available: http://leenissen.dk/fann/wp/
[16]
A. Frank and A. Asuncion, "UCI machine learning repository," 2010. Available: http://archive.ics.uci.edu/ml
[17]
S. Galal and M. Horowitz, "Energy-efficient floating-point unit design," IEEE Trans. Comput., vol. 60, no. 7, pp. 913-922, 2011.
[18]
V. Govindaraju, C.-H. Ho, and K. Sankaralingam, "Dynamically specialized datapaths for energy efficient computing," in HPCA, 2011.
[19]
S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in MICRO, 2011.
[20]
A. Guzhva, S. Dolenko, and I. Persiantsev, "Multifold acceleration of neural network computations using GPU," in ICANN, 2009.
[21]
R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. C. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz, "Understanding sources of inefficiency in general-purpose chips," in ISCA, 2010.
[22]
A. Hashmi, H. Berry, O. Temam, and M. H. Lipasti, "Automatic abstraction and fault tolerance in cortical microarchitectures," in ISCA, 2011.
[23]
A. Hashmi, A. Nere, J. J. Thomas, and M. Lipasti, "A case for neuromorphic ISAs," in ASPLOS, 2011.
[24]
R. Hegde and N. R. Shanbhag, "Energy-efficient signal processing via algorithmic noise-tolerance," in ISLPED, 1999.
[25]
A. Joubert, B. Belhadj, O. Temam, and R. Heliot, "Hardware spiking neurons design: Analog or digital?" in IJCNN, 2012.
[26]
L. Leem, H. Cho, J. Bau, Q. A. Jacobson, and S. Mitra, "ERSA: Error resilient system architecture for probabilistic applications," in DATE, 2010.
[27]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO, 2009.
[28]
X. Li and D. Yeung, "Exploiting soft computing for increased fault tolerance," in ASGI, 2006.
[29]
S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn, "Flikker: Saving refresh-power in mobile devices through critical data partitioning," in ASPLOS, 2011.
[30]
S. Misailovic, S. Sidiroglou, H. Hoffman, and M. Rinard, "Quality of service profiling," in ICSE, 2010.
[31]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0," in MICRO, 2007.
[32]
S. Narayanan, J. Sartori, R. Kumar, and D. L. Jones, "Scalable stochastic processors," in DATE, 2010.
[33]
NetBSD Documentation, "How lazy FPU context switch works," 2011. Available: http://www.netbsd.org/docs/kernel/lazyfpu.html
[34]
K.-S. Oh and K. Jung, "GPU implementation of neural networks," Pattern Recognition, vol. 37, no. 6, pp. 1311-1314, 2004.
[35]
A. Patel, F. Afram, S. Chen, and K. Ghose, "MARSSx86: A full system simulator for x86 CPUs," in DAC, 2011.
[36]
A. Pedram, R. A. van de Geijn, and A. Gerstlauer, "Codesign tradeoffs for high-performance, low-power linear algebra architectures," Computers, IEEE Transactions on, vol. 61, no. 12, Dec. 2012.
[37]
K. Przytula and V. P. Kumar, Eds., Parallel Digital Implementations of Neural Networks. Prentice Hall, 1993.
[38]
A. R. Putnam, D. Bennett, E. Dellinger, J. Mason, and P. Sundararajan, "CHiMPS: A high-level compilation flow for hybrid CPU-FPGA architectures," in FPGA, 2008.
[39]
R. Razdan and M. D. Smith, "A high-performance microarchitecture with hardware-programmable functional units," in MICRO, 1994.
[40]
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 1986, vol. 1, pp. 318-362.
[41]
A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman, "EnerJ: Approximate data types for safe and general low-power computation," in PLDI, 2011.
[42]
J. Schemmel, J. Fieres, and K. Meier, "Wafer-scale integration of analog neural networks," in IJCNN, 2008.
[43]
S. Sidiroglou-Douskos, S. Misailovic, H. Hoffmann, and M. Rinard, "Managing performance vs. accuracy trade-offs with loop perforation," in FSE, 2011.
[44]
S. Tam, B. Gupta, H. Castro, and M. Holler, "Learning on an analog VLSI neural network chip," in SMC, 1990.
[45]
O. Temam, "A defect-tolerant accelerator for emerging high-performance applications," in ISCA, 2012.
[46]
N. Townsend and L. Tarassenko, "Estimations of error bounds for neural-network function approximators," IEEE Transactions on Neural Networks, vol. 10, no. 2, Mar. 1999.
[47]
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation cores: Reducing the energy of mature computations," in ASPLOS, 2010.
[48]
G. Venkatesh, J. Sampson, N. Goulding, S. K. Venkata, S. Swanson, and M. Taylor, "QsCores: Trading dark silicon for scalable energy efficiency with quasi-specific cores," in MICRO, 2011.
[49]
V. Wong and M. Horowitz, "Soft error resilience of probabilistic inference applications," in SELSE, 2006.
[50]
J. Zhu and P. Sutton, "FPGA implementations of neural networks: A survey of a decade of progress," in FPL, 2003.

Cited By

View all
  • (2024)ZeroGrads: Learning Local Surrogates for Non-Differentiable GraphicsACM Transactions on Graphics10.1145/365817343:4(1-15)Online publication date: 19-Jul-2024
  • (2024)Tandem Processor: Grappling with Emerging Operators in Neural NetworksProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640365(1165-1182)Online publication date: 27-Apr-2024
  • (2024)Simultaneous and Heterogenous Multithreading: Exploiting Simultaneous and Heterogeneous Parallelism in Accelerator-Rich ArchitecturesIEEE Micro10.1109/MM.2024.341494144:4(11-19)Online publication date: 8-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
December 2012
487 pages
ISBN:9780769549248

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 December 2012

Check for updates

Author Tags

  1. Accelerator
  2. Approximate Computing
  3. NPU
  4. Neural Networks
  5. Neural Processing Unit

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ZeroGrads: Learning Local Surrogates for Non-Differentiable GraphicsACM Transactions on Graphics10.1145/365817343:4(1-15)Online publication date: 19-Jul-2024
  • (2024)Tandem Processor: Grappling with Emerging Operators in Neural NetworksProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640365(1165-1182)Online publication date: 27-Apr-2024
  • (2024)Simultaneous and Heterogenous Multithreading: Exploiting Simultaneous and Heterogeneous Parallelism in Accelerator-Rich ArchitecturesIEEE Micro10.1109/MM.2024.341494144:4(11-19)Online publication date: 8-Jul-2024
  • (2023)Building Efficient Neural PrefetcherProceedings of the International Symposium on Memory Systems10.1145/3631882.3631903(1-12)Online publication date: 2-Oct-2023
  • (2023)AutoConstruct: Automated Neural Surrogate Model Building and Deployment for HPC ApplicationsProceedings of the 13th Workshop on AI and Scientific Computing at Scale using Flexible Computing10.1145/3589013.3596677(33-40)Online publication date: 10-Aug-2023
  • (2023)Auto-HPCnet: An Automatic Framework to Build Neural Network-based Surrogate for High-Performance Computing ApplicationsProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592985(31-44)Online publication date: 7-Aug-2023
  • (2023)HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPUProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607095(1-14)Online publication date: 12-Nov-2023
  • (2023)A Prediction System ServiceProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575714(48-60)Online publication date: 27-Jan-2023
  • (2023)Approximation Opportunities in Edge Computing Hardware: A Systematic Literature ReviewACM Computing Surveys10.1145/357277255:12(1-49)Online publication date: 3-Mar-2023
  • (2022)Trustworthy AI: A Computational PerspectiveACM Transactions on Intelligent Systems and Technology10.1145/354687214:1(1-59)Online publication date: 9-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media