research-article

Public Access

Architecture-Aware Approximate Computing

Authors:

Mustafa Karakoy,

Mahmut Taylan Kandemir,

Meenakshi ArunachalamAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 3, Issue 2

Article No.: 38, Pages 1 - 24

https://doi.org/10.1145/3341617.3326153

Published: 19 June 2019 Publication History

Abstract

Deliberate use of approximate computing has been an active research area recently. Observing that many application programs from different domains can live with less-than-perfect accuracy, existing techniques try to trade off program output accuracy with performance-energy savings. While these works provide point solutions, they leave three critical questions regarding approximate computing unanswered, especially in the context of dropping/skipping costly data accesses: (i) what is the maximum potential of skipping (i.e., not performing) data accesses under a given inaccuracy bound?; (ii) can we identify the data accesses to drop randomly, or is being architecture aware (i.e., identifying the costliest data accesses in a given architecture) critical?; and (iii) do two executions that skip the same number of data accesses always result in the same output quality (error)? This paper first provides answers to these questions using ten multithreaded workloads, and then, motivated by the negative answer to the third question, presents a program slicing-based approach that identifies the set of data accesses to drop such that (i) the resulting performance/energy benefits are maximized and (ii) the execution remains within the error (inaccuracy) bound specified by the user. Our slicing-based approach first uses backward slicing and then forward slicing to decide the set of data accesses to drop. Our experimental evaluations using ten multithreaded workloads show that, when averaged over all benchmark programs we have, 8.8% performance improvement and 13.7% energy saving are possible when we set the error bound to 2%, and the corresponding improvements jump to 15% and 25%, respectively, when the error bound is raised to 4%.

References

[1]

Akturk, I., Khatamifard, K., and Karpuzcu, U. R. On quantification of accuracy loss in approximate computing.

[2]

Bienia, C., Kumar, S., Singh, J. P., and Li, K. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT (2008).

Digital Library

[3]

Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., Hestness, J., Hower, D. R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M. D., and Wood, D. A. The gem5 simulator. ACM SIGARCH Computer Architecture News (2011).

Digital Library

[4]

Carbin, M., and Rinard, M. C. Automatically Identifying Critical Input Regions and Code in Applications. In Proceedings of the 19th International Symposium on Software Testing and Analysis.

Digital Library

[5]

Chippa, V. K., Venkataramani, S., Chakradhar, S. T., Roy, K., and Raghunathan, A. Approximate Computing: An Integrated Hardware Approach. In Asilomar Conference on Signals, Systems and Computers (2013).

[6]

Ding, W., Tang, X., Kandemir, M., Zhang, Y., and Kultursay, E. Optimizing Off-chip Accesses in Multicores. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2015).

Digital Library

[7]

Esmaeilzadeh, H., Sampson, A., Ceze, L., and Burger, D. Architecture support for disciplined approximate programming. In ASPLOS (2012).

Digital Library

[8]

Esmaeilzadeh, H., Sampson, A., Ceze, L., and Burger, D. Neural acceleration for general-purpose approximate programs. In MICRO (2012).

Digital Library

[9]

Ghosh, S., Martonosi, M., and Malik, S. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) (1999).

Digital Library

[10]

Grigorian, B., and Reinman, G. Improving coverage and reliability in approximate computing using application-specific, light-weight checks. First Workshop on Approximate Computing Across the System Stack (WACAS) (2014).

[11]

Han, J., and Orshansky, M. Approximate computing: An emerging paradigm for energy-efficient design. In 18th IEEE European Test Symposium (ETS) (2013).

[12]

Hegde, R., and Shanbhag, N. R. Energy-efficient signal processing via algorithmic noise-tolerance. In Proceedings of the International Symposium on Low Power Electronics and Design (1999).

Digital Library

[13]

Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keckler, S. A NUCA substrate for flexible CMP cache sharing. IEEE Transactions on Parallel and Distributed Systems (2007).

Digital Library

[14]

Kandemir, M., Zhao, H., Tang, X., and Karakoy, M. Memory Row Reuse Distance and Its Role in Optimizing Application Performance. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) (2015).

Digital Library

[15]

Kayiran, O., Jog, A., Pattnaik, A., Ausavarungnirun, R., Tang, X., Kandemir, M. T., Loh, G. H., Mutlu, O., and Das, C. R. uC-States: Fine-grained GPU Datapath Power Management. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT) (2016).

Digital Library

[16]

Kim, Y., Zhang, Y., and Li, P. An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2013).

Digital Library

[17]

Kislal, O., Kotra, J., Tang, X., Kandemir, M. T., and Jung, M. Enhancing computation-to-core assignment with physical location information. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2018).

Digital Library

[18]

Korel, B., and Laski, J. Dynamic program slicing. Information Processing Letters (1988).

Digital Library

[19]

Lattner, C., and Adve, V. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2004).

Digital Library

[20]

Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen, D. M., and Jouppi, N. P. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO (2009).

Digital Library

[21]

Li, X., and Yeung, D. Exploiting soft computing for increased fault tolerance. In In Proceedings of Workshop on Architectural Support for Gigascale Integration (2006).

[22]

Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. Value locality and load value prediction. In ASPLOS (1996).

Digital Library

[23]

Miguel, J. S., Badr, M., and Jerger, N. E. Load value approximation. In MICRO (2014).

Digital Library

[24]

Misailovic, S., Carbin, M., Achour, S., Qi, Z., and Rinard, M. C. Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (2014).

Digital Library

[25]

Mishra, A. K., Barik, R., and Paul, S. iact: A software-hardware framework for understanding the scope of approximate computing. In Workshop on Approximate Computing Across the System Stack (WACAS) (2014).

[26]

Mohapatra, D., Chippa, V. K., Raghunathan, A., and Roy, K. Design of voltage-scalable meta-functions for approximate computing. In Design, Automation Test in Europe (2011).

[27]

Nepal, K., Li, Y., Bahar, R. I., and Reda, S. Abacus: A technique for automated behavioral synthesis of approximate computing circuits. In Proceedings of the Conference on Design, Automation and Test in Europe (2014).

[28]

Pattnaik, A., Tang, X., Jog, A., Kayiran, O., Mishra, A. K., Kandemir, M. T., Mutlu, O., and Das, C. R. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT) (2016).

Digital Library

[29]

Pattnaik, A., Tang, X., Kayiran, O., Jog, A., Mishra, A., T. Kandemir, M., Sivasubramaniam, A., and Das, C. R. Opportunistic Computing in GPU Architectures. In Proceedings of the 46th International Symposium on Computer Architecture (2019).

Digital Library

[30]

Pugh, W., and Rosser, E. Iteration space slicing and its application to communication optimization. In Proceedings of the 11th International Conference on Supercomputing (1997).

Digital Library

[31]

Renganarayana, L., Srinivasan, V., Nair, R., and Prener, D. Programming with relaxed synchronization. In Proceedings of the ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (2012).

Digital Library

[32]

Rinard, M. C. Unsynchronized techniques for approximate parallel computing.

[33]

Rubio-González, C., Nguyen, C., Nguyen, H. D., Demmel, J., Kahan, W., Sen, K., Bailey, D. H., Iancu, C., and Hough, D. Precimonious: Tuning assistant for floating-point precision. In 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2013).

Digital Library

[34]

Samadi, M., Jamshidi, D. A., Lee, J., and Mahlke, S. Paraprox: Pattern-based approximation for data parallel applications. In ASPLOS (2014).

Digital Library

[35]

Samadi, M., Lee, J., Jamshidi, D. A., Hormati, A., and Mahlke, S. Sage: Self-tuning approximation for graphics engines. In MICRO (2013).

Digital Library

[36]

Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., and Grossman, D. Enerj: Approximate data types for safe and general low-power computation. In PLDI (2011).

Digital Library

[37]

Shrifi, A., Ding, W., Guttman, D., Zhao, H., Tang, X., Kandemir, M., and Das, C. DEMM: a Dynamic Energy-saving mechanism for Multicore Memories. In Proceedings of the 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (2017).

[38]

Sidiroglou, S., Misailovic, S., and Hoffmann, H. Managing performance vs. accuracy trade-offs with loop perforation. In Proc. ACM SIGSOFT symposium (2011).

Digital Library

[39]

Sidiroglou-Douskos, S., Misailovic, S., Hoffmann, H., and Rinard, M. Managing Performance vs. Accuracy Trade-offs with Loop Perforation. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (2011).

Digital Library

[40]

Sodani, A., Gramunt, R., Corbal, J., Kim, H.-S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., and Liu, Y.-C. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro (2016).

Digital Library

[41]

St. Amant, R., Yazdanbakhsh, A., Park, J., Thwaites, B., Esmaeilzadeh, H., Hassibi, A., Ceze, L., and Burger, D. General-purpose code acceleration with limited-precision analog computation. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (2014).

Digital Library

[42]

Stanley-Marbell, P., and Rinard, M. Efficiency limits for value-deviation-bounded approximate communication. IEEE Embedded Systems Letters (2015).

Digital Library

[43]

Sui, X., Lenharth, A., Fussell, D. S., and Pingali, K. Proactive control of approximate programs. In ASPLOS (2016).

Digital Library

[44]

Tang, X., Kandemir, M., Yedlapalli, P., and Kotra, J. Improving Bank-Level Parallelism for Irregular Applications. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2016).

Digital Library

[45]

Tang, X., Kislal, O., Kandemir, M., and Karakoy, M. Data movement aware computation partitioning. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2017).

Digital Library

[46]

Tang, X., Pattnaik, A., Jiang, H., Kayiran, O., Jog, A., Pai, S., Ibrahim, M., Kandemir, M., and Das, C. Controlled Kernel Launch for Dynamic Parallelism in GPUs. In Proceedings of the 23rd International Symposium on High-Performance Computer Architecture (HPCA) (2017).

[47]

Tang, X., Taylan Kandemir, M., Karakoy, M., and Arunachalam, M. Co-Optimizing Memory-Level Parallelism and Cache-Level Parallelism. In Proceedings of the 40th annual ACM SIGPLAN conference on Programming Language Design and Implementation (2019).

Digital Library

[48]

Tip, F. A survey of program slicing techniques. Journal of programming languages (1995).

Digital Library

[49]

Vassiliadis, V., Riehme, J., Deussen, J., Parasyris, K., Antonopoulos, C. D., Bellas, N., Lalis, S., and Naumann, U. Towards automatic significance analysis for approximate computing. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (2016).

Digital Library

[50]

Venkataramani, S., Chippa, V. K., Chakradhar, S. T., Roy, K., and Raghunathan, A. Quality programmable vector processors for approximate computing. In Proc. of International Symposium on Microarchitecture (MICRO) (2013).

Digital Library

[51]

Venkataramani, S., Chippa, V. K., Chakradhar, S. T., Roy, K., and Raghunathan, A. Quality programmable vector processors for approximate computing. In MICRO (2013).

Digital Library

[52]

Venkatesan, R., Agarwal, A., Roy, K., and Raghunathan, A. Macaco: Modeling and analysis of circuits for approximate computing. In Proceedings of the International Conference on Computer-Aided Design (2011).

Digital Library

[53]

Weiser, M. Program slicing. In Proceedings of the 5th International Conference on Software Engineering (1981).

Digital Library

[54]

Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. The SPLASH-2 programs: Characterization and methodological considerations. In ACM SIGARCH Computer Architecture News (1995).

Digital Library

[55]

Wulf, W. A., and McKee, S. A. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News (1995).

Digital Library

[56]

Zhang, Q., Yuan, F., Ye, R., and Xu, Q. Approxit: An approximate computing framework for iterative methods. In The ACM/EDAC/IEEE Design Automation Conference (2014).

Digital Library

Cited By

Leon VHanif MArmeniakos GJiao XShafique MPekmestzi KSoudris D(2025)Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and ApplicationsACM Computing Surveys10.1145/371168357:7(1-36)Online publication date: 20-Feb-2025
https://dl.acm.org/doi/10.1145/3711683
Dalloo AJaleel Humaidi AAl Mhdawi AAl-Raweshidy H(2024)Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future DirectionsIEEE Access10.1109/ACCESS.2024.346737512(146022-146088)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3467375
Morie YWada YKobayashi RSakamoto R(2023)Data Transfer API and its Performance Model for Rank-Level Approximate Computing on HPC SystemsInternational Journal of Networking and Computing10.15803/ijnc.13.1_4813:1(48-61)Online publication date: 2023
https://doi.org/10.15803/ijnc.13.1_48
Show More Cited By

Index Terms

Architecture-Aware Approximate Computing
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures

Recommendations

Architecture-Aware Approximate Computing
SIGMETRICS '19: Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems

Observing that many application programs from different domains can live with less-than-perfect accuracy, existing techniques try to trade off program output accuracy with performance-energy savings. While these works provide point solutions, they leave ...
Architecture-Aware Approximate Computing

Observing that many application programs from different domains can live with less-than-perfect accuracy, existing techniques try to trade off program output accuracy with performance-energy savings. While these works provide point solutions, they leave ...
Quality programmable vector processors for approximate computing
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Approximate computing leverages the intrinsic resilience of applications to inexactness in their computations, to achieve a desirable trade-off between efficiency (performance or energy) and acceptable quality of results. To broaden the applicability of ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 3, Issue 2

June 2019

683 pages

EISSN:2476-1249

DOI:10.1145/3341617

Editors:
Augustin Chaintreau
Columbia University
,
Thomas Bonald
Telecom ParisTech
,
Nick Duffield
Texas A&M University

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2019

Published in POMACS Volume 3, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
540
Total Downloads

Downloads (Last 12 months)124
Downloads (Last 6 weeks)12

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Leon VHanif MArmeniakos GJiao XShafique MPekmestzi KSoudris D(2025)Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and ApplicationsACM Computing Surveys10.1145/371168357:7(1-36)Online publication date: 20-Feb-2025
https://dl.acm.org/doi/10.1145/3711683
Dalloo AJaleel Humaidi AAl Mhdawi AAl-Raweshidy H(2024)Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future DirectionsIEEE Access10.1109/ACCESS.2024.346737512(146022-146088)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3467375
Morie YWada YKobayashi RSakamoto R(2023)Data Transfer API and its Performance Model for Rank-Level Approximate Computing on HPC SystemsInternational Journal of Networking and Computing10.15803/ijnc.13.1_4813:1(48-61)Online publication date: 2023
https://doi.org/10.15803/ijnc.13.1_48
Dou YWang C(2023)An Optimization Technique for PMF Estimation in Approximate CircuitsJournal of Computer Science and Technology10.1007/s11390-023-2544-z38:2(289-297)Online publication date: 30-Mar-2023
https://dl.acm.org/doi/10.1007/s11390-023-2544-z
Altuntaş ZArslan SBoz B(2023)Approximate execution and grouping of critical sections for performance‐accuracy tradeoffConcurrency and Computation: Practice and Experience10.1002/cpe.761435:24Online publication date: 11-Jan-2023
https://doi.org/10.1002/cpe.7614

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents