Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Architecture-Aware Approximate Computing

Published: 19 June 2019 Publication History

Abstract

Deliberate use of approximate computing has been an active research area recently. Observing that many application programs from different domains can live with less-than-perfect accuracy, existing techniques try to trade off program output accuracy with performance-energy savings. While these works provide point solutions, they leave three critical questions regarding approximate computing unanswered, especially in the context of dropping/skipping costly data accesses: (i) what is the maximum potential of skipping (i.e., not performing) data accesses under a given inaccuracy bound?; (ii) can we identify the data accesses to drop randomly, or is being architecture aware (i.e., identifying the costliest data accesses in a given architecture) critical?; and (iii) do two executions that skip the same number of data accesses always result in the same output quality (error)? This paper first provides answers to these questions using ten multithreaded workloads, and then, motivated by the negative answer to the third question, presents a program slicing-based approach that identifies the set of data accesses to drop such that (i) the resulting performance/energy benefits are maximized and (ii) the execution remains within the error (inaccuracy) bound specified by the user. Our slicing-based approach first uses backward slicing and then forward slicing to decide the set of data accesses to drop. Our experimental evaluations using ten multithreaded workloads show that, when averaged over all benchmark programs we have, 8.8% performance improvement and 13.7% energy saving are possible when we set the error bound to 2%, and the corresponding improvements jump to 15% and 25%, respectively, when the error bound is raised to 4%.

References

[1]
Akturk, I., Khatamifard, K., and Karpuzcu, U. R. On quantification of accuracy loss in approximate computing.
[2]
Bienia, C., Kumar, S., Singh, J. P., and Li, K. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT (2008).
[3]
Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., Hestness, J., Hower, D. R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M. D., and Wood, D. A. The gem5 simulator. ACM SIGARCH Computer Architecture News (2011).
[4]
Carbin, M., and Rinard, M. C. Automatically Identifying Critical Input Regions and Code in Applications. In Proceedings of the 19th International Symposium on Software Testing and Analysis.
[5]
Chippa, V. K., Venkataramani, S., Chakradhar, S. T., Roy, K., and Raghunathan, A. Approximate Computing: An Integrated Hardware Approach. In Asilomar Conference on Signals, Systems and Computers (2013).
[6]
Ding, W., Tang, X., Kandemir, M., Zhang, Y., and Kultursay, E. Optimizing Off-chip Accesses in Multicores. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2015).
[7]
Esmaeilzadeh, H., Sampson, A., Ceze, L., and Burger, D. Architecture support for disciplined approximate programming. In ASPLOS (2012).
[8]
Esmaeilzadeh, H., Sampson, A., Ceze, L., and Burger, D. Neural acceleration for general-purpose approximate programs. In MICRO (2012).
[9]
Ghosh, S., Martonosi, M., and Malik, S. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) (1999).
[10]
Grigorian, B., and Reinman, G. Improving coverage and reliability in approximate computing using application-specific, light-weight checks. First Workshop on Approximate Computing Across the System Stack (WACAS) (2014).
[11]
Han, J., and Orshansky, M. Approximate computing: An emerging paradigm for energy-efficient design. In 18th IEEE European Test Symposium (ETS) (2013).
[12]
Hegde, R., and Shanbhag, N. R. Energy-efficient signal processing via algorithmic noise-tolerance. In Proceedings of the International Symposium on Low Power Electronics and Design (1999).
[13]
Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keckler, S. A NUCA substrate for flexible CMP cache sharing. IEEE Transactions on Parallel and Distributed Systems (2007).
[14]
Kandemir, M., Zhao, H., Tang, X., and Karakoy, M. Memory Row Reuse Distance and Its Role in Optimizing Application Performance. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) (2015).
[15]
Kayiran, O., Jog, A., Pattnaik, A., Ausavarungnirun, R., Tang, X., Kandemir, M. T., Loh, G. H., Mutlu, O., and Das, C. R. uC-States: Fine-grained GPU Datapath Power Management. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT) (2016).
[16]
Kim, Y., Zhang, Y., and Li, P. An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2013).
[17]
Kislal, O., Kotra, J., Tang, X., Kandemir, M. T., and Jung, M. Enhancing computation-to-core assignment with physical location information. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2018).
[18]
Korel, B., and Laski, J. Dynamic program slicing. Information Processing Letters (1988).
[19]
Lattner, C., and Adve, V. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO) (2004).
[20]
Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen, D. M., and Jouppi, N. P. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO (2009).
[21]
Li, X., and Yeung, D. Exploiting soft computing for increased fault tolerance. In In Proceedings of Workshop on Architectural Support for Gigascale Integration (2006).
[22]
Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. Value locality and load value prediction. In ASPLOS (1996).
[23]
Miguel, J. S., Badr, M., and Jerger, N. E. Load value approximation. In MICRO (2014).
[24]
Misailovic, S., Carbin, M., Achour, S., Qi, Z., and Rinard, M. C. Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (2014).
[25]
Mishra, A. K., Barik, R., and Paul, S. iact: A software-hardware framework for understanding the scope of approximate computing. In Workshop on Approximate Computing Across the System Stack (WACAS) (2014).
[26]
Mohapatra, D., Chippa, V. K., Raghunathan, A., and Roy, K. Design of voltage-scalable meta-functions for approximate computing. In Design, Automation Test in Europe (2011).
[27]
Nepal, K., Li, Y., Bahar, R. I., and Reda, S. Abacus: A technique for automated behavioral synthesis of approximate computing circuits. In Proceedings of the Conference on Design, Automation and Test in Europe (2014).
[28]
Pattnaik, A., Tang, X., Jog, A., Kayiran, O., Mishra, A. K., Kandemir, M. T., Mutlu, O., and Das, C. R. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT) (2016).
[29]
Pattnaik, A., Tang, X., Kayiran, O., Jog, A., Mishra, A., T. Kandemir, M., Sivasubramaniam, A., and Das, C. R. Opportunistic Computing in GPU Architectures. In Proceedings of the 46th International Symposium on Computer Architecture (2019).
[30]
Pugh, W., and Rosser, E. Iteration space slicing and its application to communication optimization. In Proceedings of the 11th International Conference on Supercomputing (1997).
[31]
Renganarayana, L., Srinivasan, V., Nair, R., and Prener, D. Programming with relaxed synchronization. In Proceedings of the ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (2012).
[32]
Rinard, M. C. Unsynchronized techniques for approximate parallel computing.
[33]
Rubio-González, C., Nguyen, C., Nguyen, H. D., Demmel, J., Kahan, W., Sen, K., Bailey, D. H., Iancu, C., and Hough, D. Precimonious: Tuning assistant for floating-point precision. In 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (2013).
[34]
Samadi, M., Jamshidi, D. A., Lee, J., and Mahlke, S. Paraprox: Pattern-based approximation for data parallel applications. In ASPLOS (2014).
[35]
Samadi, M., Lee, J., Jamshidi, D. A., Hormati, A., and Mahlke, S. Sage: Self-tuning approximation for graphics engines. In MICRO (2013).
[36]
Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., and Grossman, D. Enerj: Approximate data types for safe and general low-power computation. In PLDI (2011).
[37]
Shrifi, A., Ding, W., Guttman, D., Zhao, H., Tang, X., Kandemir, M., and Das, C. DEMM: a Dynamic Energy-saving mechanism for Multicore Memories. In Proceedings of the 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (2017).
[38]
Sidiroglou, S., Misailovic, S., and Hoffmann, H. Managing performance vs. accuracy trade-offs with loop perforation. In Proc. ACM SIGSOFT symposium (2011).
[39]
Sidiroglou-Douskos, S., Misailovic, S., Hoffmann, H., and Rinard, M. Managing Performance vs. Accuracy Trade-offs with Loop Perforation. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (2011).
[40]
Sodani, A., Gramunt, R., Corbal, J., Kim, H.-S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., and Liu, Y.-C. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro (2016).
[41]
St. Amant, R., Yazdanbakhsh, A., Park, J., Thwaites, B., Esmaeilzadeh, H., Hassibi, A., Ceze, L., and Burger, D. General-purpose code acceleration with limited-precision analog computation. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (2014).
[42]
Stanley-Marbell, P., and Rinard, M. Efficiency limits for value-deviation-bounded approximate communication. IEEE Embedded Systems Letters (2015).
[43]
Sui, X., Lenharth, A., Fussell, D. S., and Pingali, K. Proactive control of approximate programs. In ASPLOS (2016).
[44]
Tang, X., Kandemir, M., Yedlapalli, P., and Kotra, J. Improving Bank-Level Parallelism for Irregular Applications. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2016).
[45]
Tang, X., Kislal, O., Kandemir, M., and Karakoy, M. Data movement aware computation partitioning. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2017).
[46]
Tang, X., Pattnaik, A., Jiang, H., Kayiran, O., Jog, A., Pai, S., Ibrahim, M., Kandemir, M., and Das, C. Controlled Kernel Launch for Dynamic Parallelism in GPUs. In Proceedings of the 23rd International Symposium on High-Performance Computer Architecture (HPCA) (2017).
[47]
Tang, X., Taylan Kandemir, M., Karakoy, M., and Arunachalam, M. Co-Optimizing Memory-Level Parallelism and Cache-Level Parallelism. In Proceedings of the 40th annual ACM SIGPLAN conference on Programming Language Design and Implementation (2019).
[48]
Tip, F. A survey of program slicing techniques. Journal of programming languages (1995).
[49]
Vassiliadis, V., Riehme, J., Deussen, J., Parasyris, K., Antonopoulos, C. D., Bellas, N., Lalis, S., and Naumann, U. Towards automatic significance analysis for approximate computing. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (2016).
[50]
Venkataramani, S., Chippa, V. K., Chakradhar, S. T., Roy, K., and Raghunathan, A. Quality programmable vector processors for approximate computing. In Proc. of International Symposium on Microarchitecture (MICRO) (2013).
[51]
Venkataramani, S., Chippa, V. K., Chakradhar, S. T., Roy, K., and Raghunathan, A. Quality programmable vector processors for approximate computing. In MICRO (2013).
[52]
Venkatesan, R., Agarwal, A., Roy, K., and Raghunathan, A. Macaco: Modeling and analysis of circuits for approximate computing. In Proceedings of the International Conference on Computer-Aided Design (2011).
[53]
Weiser, M. Program slicing. In Proceedings of the 5th International Conference on Software Engineering (1981).
[54]
Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. The SPLASH-2 programs: Characterization and methodological considerations. In ACM SIGARCH Computer Architecture News (1995).
[55]
Wulf, W. A., and McKee, S. A. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News (1995).
[56]
Zhang, Q., Yuan, F., Ye, R., and Xu, Q. Approxit: An approximate computing framework for iterative methods. In The ACM/EDAC/IEEE Design Automation Conference (2014).

Cited By

View all
  • (2025)Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and ApplicationsACM Computing Surveys10.1145/371168357:7(1-36)Online publication date: 20-Feb-2025
  • (2024)Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future DirectionsIEEE Access10.1109/ACCESS.2024.346737512(146022-146088)Online publication date: 2024
  • (2023)Data Transfer API and its Performance Model for Rank-Level Approximate Computing on HPC SystemsInternational Journal of Networking and Computing10.15803/ijnc.13.1_4813:1(48-61)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 3, Issue 2
June 2019
683 pages
EISSN:2476-1249
DOI:10.1145/3341617
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2019
Published in POMACS Volume 3, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximate computing
  2. compiler
  3. manycore system

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)124
  • Downloads (Last 6 weeks)12
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and ApplicationsACM Computing Surveys10.1145/371168357:7(1-36)Online publication date: 20-Feb-2025
  • (2024)Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future DirectionsIEEE Access10.1109/ACCESS.2024.346737512(146022-146088)Online publication date: 2024
  • (2023)Data Transfer API and its Performance Model for Rank-Level Approximate Computing on HPC SystemsInternational Journal of Networking and Computing10.15803/ijnc.13.1_4813:1(48-61)Online publication date: 2023
  • (2023)An Optimization Technique for PMF Estimation in Approximate CircuitsJournal of Computer Science and Technology10.1007/s11390-023-2544-z38:2(289-297)Online publication date: 30-Mar-2023
  • (2023)Approximate execution and grouping of critical sections for performance‐accuracy tradeoffConcurrency and Computation: Practice and Experience10.1002/cpe.761435:24Online publication date: 11-Jan-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media