survey

Approximate Communication: Techniques for Reducing Communication Bottlenecks in Large-Scale Parallel Systems

Authors:

Karen Khatamifard,

David J. Lilja,

Ulya KarpuzcuAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 51, Issue 1

Article No.: 1, Pages 1 - 32

https://doi.org/10.1145/3145812

Published: 10 January 2018 Publication History

Abstract

Approximate computing has gained research attention recently as a way to increase energy efficiency and/or performance by exploiting some applications’ intrinsic error resiliency. However, little attention has been given to its potential for tackling the communication bottleneck that remains one of the looming challenges to be tackled for efficient parallelism. This article explores the potential benefits of approximate computing for communication reduction by surveying three promising techniques for approximate communication: compression, relaxed synchronization, and value prediction. The techniques are compared based on an evaluation framework composed of communication cost reduction, performance, energy reduction, applicability, overheads, and output degradation. Comparison results demonstrate that lossy link compression and approximate value prediction show great promise for reducing the communication bottleneck in bandwidth-constrained applications. Meanwhile, relaxed synchronization is found to provide large speedups for select error-tolerant applications, but suffers from limited general applicability and unreliable output degradation guarantees. Finally, this article concludes with several suggestions for future research on approximate communication techniques.

References

[1]

Tor M. Aamodt and Paul Chow. 2008. Compile-time and instruction-set methods for improving floating-to fixed-point conversion accuracy. ACM Transactions on Embedded Computing Systems 7, 3, 26.

Digital Library

[2]

Bülent Abali, Hubertus Franke, Dan E. Poff, Robert A. Saccone, Jr., Charles O. Schulz, Lorraine M. Herger, and T. Basil Smith. 2001. Memory expansion technology (MXT): software support and performance. IBM Journal of Research and Development 45, 2, 287--301.

Digital Library

[3]

Don Adams. 1993. CRAY T3D System Architecture Overview Manual. Retrieved November 29, 2017 from ftp://ftp.cray.com/product-info/mpp/T3D_Architecture_Over/T3D.overview.html.

[4]

Ismail Akturk, Karen Khatamifard, and Ulya R. Karpuzcu. 2015. On quantification of accuracy loss in approximate computing. In Workshop on Duplicating, Deconstructing and Debunking (WDDD’15). 15.

[5]

Alaa R. Alameldeen and David A. Wood. 2004. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture. IEEE, 212--223.

[6]

Alaa R. Alameldeen and David A. Wood. 2007. Interactions between compression and prefetching in chip multiprocessors. In IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). IEEE, 228--239.

[7]

George Almási, Philip Heidelberger, Charles J. Archer, Xavier Martorell, C. Chris Erway, José E. Moreira, B. Steinmacher-Burow, and Yili Zheng. 2005. Optimization of MPI collective communication on BlueGene/L systems. In Proceedings of the 19th Annual International Conference on Supercomputing (ICS’05). ACM, New York, NY, 253--262.

Digital Library

[8]

Carlos Alvarez, Jesus Corbal, and Mateo Valero. 2005. Fuzzy memoization for floating-point multimedia applications. IEEE Transactions on Computers 54, 7, 922--927.

Digital Library

[9]

Gene M. Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, Spring Joint Computer Conference. ACM, 483--485.

Digital Library

[10]

Baik Song An, Manhee Lee, Ki Hwan Yum, and Eun Jung Kim. 2012. Efficient data packet compression for cache coherent multiprocessor systems. In Data Compression Conference (DCC’12). IEEE, 129--138.

Digital Library

[11]

Mohammad Ashraful Anam, Paul N. Whatmough, and Yiannis Andreopoulos. 2013. Precision-energy-throughput scaling of generic matrix multiplication and discrete convolution kernels via linear projections. In IEEE 11th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia’13). IEEE, 21--30.

[12]

Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A Language and Compiler for Algorithmic Choice. Vol. 44. ACM.

[13]

Jason Ansel, Yee Lok Wong, Cy Chan, Marek Olszewski, Alan Edelman, and Saman Amarasinghe. 2011. Language and compiler support for auto-tuning variable-accuracy algorithms. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 85--96.

Digital Library

[14]

Woongki Baek and Trishul M. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In ACM SIGPLAN Notices, Vol. 45. ACM, 198--209.

[15]

Arnab Banerjee, Pascal T. Wolkotte, Robert D. Mullins, Simon W. Moore, and Gerard J. M. Smit. 2009. An energy and performance exploration of network-on-chip architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 3, 319--329.

Digital Library

[16]

Carl J. Beckmann and Constantine D. Polychronopoulos. 1990. Fast barrier synchronization hardware. In Proceedings of the 1990 ACM/IEEE Conference on Supercomputing (Supercomputing’90). IEEE Computer Society, Washington, DC, USA, 180--189.

[17]

K. Bergman and others. 2008. Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep 15 (2008).

[18]

Tekin Bicer, Jian Yin, Dereck Chiu, Gagan Agrawal, and Karen Schuchardt. 2013. Integrating online compression to accelerate large-scale data analytics applications. In IEEE 27th International Symposium on Parallel 8 Distributed Processing (IPDPS’13). IEEE, 1205--1216.

Digital Library

[19]

Mark Buckler, Wayne Burleson, and Greg Sadowski. 2013. Low-power networks-on-chip: Progress and remaining challenges. In 2013 IEEE International Symposium on Low Power Electronics and Design (ISLPED’13). IEEE, 132--134.

[20]

Huy Bui, Hal Finkel, Venkatram Vishwanath, Salman Habib, Katrin Heitmann, Jason Leigh, Michael Papka, and Kevin Harms. 2014. Scalable parallel I/O on a blue gene/Q supercomputer using compression, topology-aware data aggregation, and subfiling. In 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP’14). IEEE, 107--111.

Digital Library

[21]

Surendra Byna, Jiayuan Meng, Anand Raghunathan, Srimat Chakradhar, and Srihari Cadambi. 2010. Best-effort semantic document search on GPUs. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM, 86--93.

Digital Library

[22]

Brad Calder, Glenn Reinman, and Dean M. Tullsen. 1999. Selective value prediction. In Proceedings of the 26th International Symposium on Computer Architecture. IEEE, 64--74.

[23]

Ramon Canal, Antonio González, and James E. Smith. 2000. Very low power pipelines using significance compression. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture. ACM, 181--190.

[24]

Vito Cappellini. 1985. Data Compression and Error Control Techniques with Applications. Academic Press, Inc., Cambridge, MA.

[25]

Michael Carbin, Sasa Misailovic, and Martin C. Rinard. 2013. Verifying quantitative reliability for programs that execute on unreliable hardware. In ACM SIGPLAN Notices, Vol. 48. ACM, 33--52.

[26]

Srimat T. Chakradhar and Anand Raghunathan. 2010. Best-effort computing: Re-thinking parallel software and hardware. In 47th ACM/IEEE Design Automation Conference (DAC’10). IEEE, 865--870.

[27]

Jie Chen and W. Watson. 2008. Software barrier performance on dual quad-core opterons. International Conference on Networking, Architecture, and Storage, 2008 (NAS’08). 303--309.

Digital Library

[28]

Yen-Kuang Chen, Jatin Chhugani, Pradeep Dubey, Christopher J. Hughes, Daehyun Kim, Sanjeev Kumar, Victor W. Lee, Anthony D. Nguyen, and Mikhail Smelyanskiy. 2008. Convergence of recognition, mining, and synthesis workloads and its implications. Proceedings of IEEE 96, 5, 790--807.

[29]

Vinay K. Chippa, Hrishikesh Jayakumar, Debabrata Mohapatra, Kaushik Roy, and Anand Raghunathan. 2013. Energy-efficient recognition and mining processor using scalable effort design. In IEEE Custom Integrated Circuits Conference (CICC’13). IEEE, 1--4.

[30]

Vinay Kumar Chippa, Debabrata Mohapatra, Kaushik Roy, Srimat T. Chakradhar, and Anand Raghunathan. 2014. Scalable effort hardware design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 9, 2004--2016.

[31]

Vinay K. Chippa, Swagath Venkataramani, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Approximate computing: An integrated hardware approach. In Asilomar Conference on Signals, Systems and Computers. IEEE, 111--117.

[32]

Vinay K. Chippa, Swagath Venkataramani, Kaushik Roy, and Anand Raghunathan. 2014. StoRM: A stochastic recognition and mining processor. In Proceedings of the 2014 International Symposium on Low Power Electronics and Design. ACM, 39--44.

Digital Library

[33]

Kyungsang Cho, Yongjun Lee, Young H. Oh, Gyoo-cheol Hwang, and Jae W. Lee. 2014. eDRAM-based tiered-reliability memory with applications to low-power frame buffers. In IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’14). IEEE, 333--338.

[34]

Marcelo Cintra and Josep Torrellas. 2002. Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 43--54.

Digital Library

[35]

R. J. Cintra. 2011. An integer approximation method for discrete sinusoidal transforms. Circuits, Systems, and Signal Processing 30, 6, 1481--1501.

Digital Library

[36]

Renato J. Cintra and Fábio M. Bayer. 2011. A DCT approximation for image compression. IEEE Signal Processing Letters 18, 10, 579--582.

[37]

Daniel Citron and Larry Rudolph. 1995. Creating a wider bus using caching techniques. In Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture. IEEE, 90--99.

Digital Library

[38]

Paul Coteus, H. Randall Bickford, Thomas M. Cipolla, Paul Crumley, Alan Gara, Shawn Hall, Gerard V. Kopcsay, Alphonso P. Lanzetta, Lawrence S. Mok, Rick A. Rand, Richard A. Swetz, Todd Takken, Paul La Rocca, Christopher Marroquin, Philip R. Germann, and Mark J. Jeanson. 2005. Packaging the blue gene/L supercomputer. IBM Journal of Research and Development 49, 2--3, 213--248.

Digital Library

[39]

David E. Culler, Jaswinder Pal Singh, and Anoop Gupta. 1999. Parallel Computer Architecture: A Hardware/Software Approach. Gulf Professional Publishing, Houston, TX.

[40]

William J. Dally and Brian Towles. 2001. Route packets, not wires: On-chip interconnection networks. In Proceedings of the Design Automation Conference. IEEE, 684--689.

[41]

Reetuparna Das, Asit K. Mishra, Chrysostomos Nicopoulos, Dongkook Park, Vijaykrishnan Narayanan, Ravishankar Iyer, Mazin S. Yousif, and Chita R. Das. 2008. Performance and power optimization through data compression in network-on-chip architectures. In IEEE 14th International Symposium on High Performance Computer Architecture (HPCA’08). IEEE, 215--225.

[42]

Marc De Kruijf, Shuou Nomura, and Karthikeyan Sankaralingam. 2010. Relax: An architectural framework for software recovery of hardware faults. In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 497--508.

Digital Library

[43]

Li Deng and Douglas O’Shaughnessy. 2003. Speech Processing: A Dynamic and Optimization-Oriented Approach. CRC Press, Boca Raton, FL.

[44]

J. Dongarra, P. Luszczek, and A. Petitet. 2003. The LINPACK benchmark: Past, present, and future. Concurrency and Computation: Practice and Experience 15, 9, 803--820.

[45]

Zidong Du, Avinash Lingamneni, Yunji Chen, Krishna Palem, Olivier Temam, and Chengyong Wu. 2014. Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators. In 19th Asia and South Pacific Design Automation Conference (ASP-DAC’14). IEEE, 201--206.

[46]

Peter Düben, Jeremy Schlachter, Sreelatha Yenugula, John Augustine, Christian Enz, K. Palem, T. N. Palmer, and others. 2015. Opportunities for energy efficient computing: A study of inexact general purpose processors for high-performance and big-data applications. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition. EDA Consortium, 764--769.

[47]

Pradeep Dubey. 2005. Recognition, mining and synthesis moves computers to the era of Tera. Technology@ Intel Magazine 9, 2, 1--10.

[48]

Peter Elias. 1955. Predictive coding--I. IRE Transactions on Information Theory 1, 1, 16--24.

[49]

Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In International Symposium on Computer Architecture.

Digital Library

[50]

Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Architecture support for disciplined approximate programming. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[51]

Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 449--460.

Digital Library

[52]

Marius Evers, Po-Yung Chang, and Yale N. Patt. 1996. Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches. In ACM SIGARCH Computer Architecture News, Vol. 24. ACM, 3--11.

[53]

Yuntan Fang, Huawei Li, and Xiaowei Li. 2012. SoftPCM: Enhancing energy efficiency and lifetime of phase change memory in video applications via approximate write. In IEEE 21st Asian Test Symposium (ATS’12). IEEE, 131--136.

Digital Library

[54]

Eric Freudenthal and Olivier Peze. 1988. Efficient Synchronization Algorithms Using Fetch-and-Add on Multiple Bitfield Integers. Ultracomputer Note 148.

[55]

Shrikanth Ganapathy, Georgios Karakonstantis, Adam Teman, and Andreas Burg. 2015. Mitigating the impact of faults in unreliable memories for error-resilient applications. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 102.

Digital Library

[56]

Bart Goeman, Hans Vandierendonck, and Koen De Bosschere. 2001. Differential FCM: Increasing value prediction accuracy by improving table usage efficiency. In 7th International Symposium on High-Performance Computer Architecture (HPCA’01). IEEE, 207--216.

[57]

Inigo Goiri, Ricardo Bianchini, Santosh Nagarakatte, and Thu D. Nguyen. 2015. Approxhadoop: Bringing approximations to mapreduce frameworks. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 383--397.

[58]

Jill R. Goldschneider. 1997. Lossy Compression of Scientific Data Via Wavelets and Vector Quantization. Ph.D. thesis, University of Washington, Seattle, WA. https://digital.lib.washington.edu/researchworks/handle/1773/5881?show=full.

[59]

Beayna Grigorian and Glenn Reinman. 2015. Accelerating divergent applications on SIMD architectures using neural networks. ACM Transactions on Architecture and Code Optimization 12, 1, 2.

Digital Library

[60]

Vaibhav Gupta, Debabrata Mohapatra, Sang Phill Park, Anand Raghunathan, and Kaushik Roy. 2011. IMPACT: Imprecise adders for low-power approximate computing. In Proceedings of the 17th IEEE/ACM International Symposium on Low-power Electronics and Design. IEEE Press, 409--414.

Digital Library

[61]

Erik G. Hallnor and Steven K. Reinhardt. 2004. A compressed memory hierarchy using an indirect index cache. In Proceedings of the 3rd Workshop on Memory Performance Issues: In Conjunction with the 31st International Symposium on Computer Architecture. ACM, 9--15.

[62]

Maurice Herlihy, J. Eliot, and B. Moss. 1993. Transactional Memory: Architectural Support for Lock-Free Data Structures. Vol. 21. ACM.

Digital Library

[63]

T. Hoefler, T. Mehlan, F. Mietke, and W. Rehm. 2004. A survey of barrier algorithms for coarse grained supercomputers. Chemnitzer Informatik Berichte 4, 3 (2004).

[64]

Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant Agarwal, and Martin Rinard. 2009. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Technical Report MIT-CSAIL-TR-2209-037, EECS, MIT.

[65]

Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard. 2011. Dynamic knobs for responsive power-aware computing. In ACM SIGPLAN Notices, Vol. 46. ACM, 199--212.

Digital Library

[66]

Chih-Chieh Hsiao, Slo-Li Chu, and Chen-Yu Chen. 2013. Energy-aware hybrid precision selection framework for mobile GPUs. Computers 8 Graphics 37, 5, 431--444.

[67]

Jiawei Huang, John Lach, and Gabriel Robins. 2012. A methodology for energy-quality tradeoff using imprecise hardware. In Proceedings of the 49th Annual Design Automation Conference. ACM, 504--509.

Digital Library

[68]

Jeremy Iverson, Chandrika Kamath, and George Karypis. 2012. Fast and effective lossy compression algorithms for scientific datasets. In Euro-Par 2012 Parallel Processing. Springer, 843--856.

[69]

Yuho Jin, Ki Hwan Yum, and Eun Jung Kim. 2008. Adaptive data compression for high-performance low-power on-chip networks. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 354--363.

[70]

Andrew B. Kahng and Seokhyeong Kang. 2012. Accuracy-configurable adder for approximate arithmetic designs. In Proceedings of the 49th Annual Design Automation Conference. ACM, 820--825.

[71]

Georgios Karakonstantis, Debabrata Mohapatra, and Kaushik Roy. 2012. Logic and memory design based on unequal error protection for voltage-scalable, robust and adaptive DSP systems. Journal of Signal Processing Systems 68, 3, 415--431.

Digital Library

[72]

Georgios Keramidas, Chrysa Kokkala, and Iakovos Stamoulis. 2015. Clumsy value cache: An approximate memoization technique for mobile GPU fragment shaders. In Workshop on Approximate Computing (WAPCO’15).

[73]

Daya Shanker Khudia and Scott Mahlke. 2014. Harnessing soft computations for low-budget fault tolerance. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE, 319--330.

Digital Library

[74]

Hyungjun Kim, Pritha Ghoshal, Boris Grot, Paul V. Gratz, and Daniel A. Jiménez. 2011. Reducing network-on-chip energy consumption through spatial locality speculation. In Proceedings of the 5th ACM/IEEE International Symposium on Networks-on-Chip. ACM, 233--240.

[75]

Chandra Krintz and Sezgin Sucu. 2006. Adaptive on-the-fly compression. IEEE Transactions on Parallel and Distributed Systems 17, 1, 15--24.

Digital Library

[76]

Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In 24th International Conference on VLSI Design (VLSI Design’11). IEEE, 346--351.

Digital Library

[77]

Didier Le Gall. 1991. MPEG: A video compression standard for multimedia applications. Communications of the ACM 34, 4, 46--58.

Digital Library

[78]

Jae Bum Lee and Chu Shik Jhon. 1998. Reducing coherence overhead of barrier synchronization in software DSMs. In Supercomputing’98: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM). IEEE Computer Society, Washington, DC, USA, 1--18.

[79]

Jang-Soo Lee, Won-Kee Hong, and Shin-Dug Kim. 1999. Design and evaluation of a selective compressed memory system. In International Conference on Computer Design (ICCD’99). IEEE, 184--191.

[80]

Kangmin Lee, Se-Joong Lee, and Hoi-Jun Yoo. 2006. Low-power network-on-chip for high-performance SoC design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 14, 2, 148--160.

Digital Library

[81]

Moon-Sang Lee, Young-Jae Kang, Joon-Won Lee, and Seung-Ryoul Maeng. 2002. OPTS: Increasing branch prediction accuracy under context switch. Microprocessors and Microsystems 26, 6, 291--300.

[82]

Sungju Lee, Heegon Kim, Yongwha Chung, and Daihee Park. 2012. Energy efficient image/video data transmission on commercial multi-core processors. Sensors 12, 11, 14647--14670.

[83]

Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Won Woo Ro, and Murali Annavaram. 2015. Warped-compression: Enabling power efficient GPUs through register compression. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. ACM, 502--514.

Digital Library

[84]

Larkhoon Leem, Hyungmin Cho, Jason Bau, Quinn A. Jacobson, and Subhasish Mitra. 2010. ERSA: Error resilient system architecture for probabilistic applications. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’10). IEEE, 1560--1565.

[85]

Debra A. Lelewer and Daniel S. Hirschberg. 1987. Data compression. ACM Computing Surveys 19, 3, 261--296.

Digital Library

[86]

Krisda Lengwehasatit and Antonio Ortega. 2004. Scalable variable complexity approximate forward DCT. IEEE Transactions on Circuits and Systems for Video Technology 14, 11, 1236--1248.

Digital Library

[87]

Mikko H. Lipasti and John Paul Shen. 1996. Exceeding the dataflow limit via value prediction. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 226--237.

[88]

Mikko H. Lipasti, Christopher B. Wilkerson, and John Paul Shen. 1996. Value locality and load value prediction. ACM SIGOPS Operating Systems Review 30, 5, 138--147.

Digital Library

[89]

Shaoshan Liu, Christine Eisenbeis, and Jean-Luc Gaudiot. 2010. A theoretical framework for value prediction in parallel systems. In 39th International Conference on Parallel Processing (ICPP’10). IEEE, 11--20.

Digital Library

[90]

Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn. 2009. Flicker: Saving refresh-power in mobile devices through critical data partitioning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09).

[91]

Gabriel H. Loh, Nuwan Jayasena, M. Oskin, Mark Nutter, David Roberts, Mitesh Meswani, Dong Ping Zhang, and Mike Ignatowski. 2013. A processing in memory taxonomy and a case for studying fixed-function pim. In Workshop on Near-Data Processing (WoNDP’13).

[92]

Enrico Magli and Gabriella Olmo. 2003. Lossy predictive coding of SAR raw data. IEEE Transactions on Geoscience and Remote Sensing 41, 5, 977--987.

[93]

Milo M. K. Martin, Daniel J. Sorin, Harold W. Cain, Mark D. Hill, and Mikko H. Lipasti. 2001. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 328--337.

[94]

John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems 9, 1, 21--65.

Digital Library

[95]

Jiayuan Meng, Srimat Chakradhar, and Anand Raghunathan. 2009. Best-effort parallel execution framework for recognition and mining applications. In IEEE International Symposium on Parallel 8 Distributed Processing (IPDPS’09). IEEE, 1--12.

[96]

Jiayuan Mengte, Anand Raghunathan, Srimat Chakradhar, and Surendra Byna. 2010. Exploiting the forgiving nature of applications for scalable parallel execution. IEEE International Symposium on Parallel 8 Distributed Processing (IPDPS). IEEE.

[97]

Joshua San Miguel, Mario Badr, and Natalie Enright Jerger. 2014. Load value approximation. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 127--139.

Digital Library

[98]

Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C. Rinard. 2014. Chisel: Reliability-and accuracy-aware optimization of approximate computational kernels. In ACM SIGPLAN Notices, Vol. 49. ACM, 309--328.

[99]

Sasa Misailovic, Deokhwan Kim, and Martin Rinard. 2013. Parallelizing sequential programs with statistical accuracy tests. ACM Transactions on Embedded Computing Systems 12, 2s, 88.

Digital Library

[100]

Sasa Misailovic, Stelios Sidiroglou, Henry Hoffmann, and Martin Rinard. 2010. Quality of service profiling. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 25--34.

Digital Library

[101]

Sasa Misailovic, Stelios Sidiroglou, and Martin C. Rinard. 2012. Dancing with uncertainty. In Proceedings of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability. ACM, 51--60.

[102]

Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys 48, 4, 62.

Digital Library

[103]

Debabrata Mohapatra, Vinay K. Chippa, Anand Raghunathan, and Kaushik Roy. 2011. Design of voltage-scalable meta-functions for approximate computing. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’11). IEEE, 1--6.

[104]

Debabrata Mohapatra, Georgios Karakonstantis, and Kaushik Roy. 2009. Significance driven computation: A voltage-scalable, variation-aware, quality-tuning motion estimator. In Proceedings of the 2009 ACM/IEEE International Symposium on Low Power Electronics and Design. ACM, 195--200.

Digital Library

[105]

Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, and Mark Oskin. 2015. SNNAP: Approximate computing on programmable socs via neural acceleration. In IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). IEEE, 603--614.

[106]

Michel Mouly, Marie-Bernadette Pautet, and Thomas Foreword By-Haug. 1992. The GSM System for Mobile Communications. Telecom Publishing.

[107]

Tarun Nakra, Rajiv Gupta, and Mary Lou Soffa. 1999. Global context-based value prediction. In Proceedings of the 5th International Symposium on High-Performance Computer Architecture. IEEE, 4--12.

[108]

Sriram Narayanan, John Sartori, Rakesh Kumar, and Douglas L. Jones. 2010. Scalable stochastic processors. In Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, 335--338.

[109]

D. Nikolopoulos and T. Papatheodorou. 2000. Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives. In Proceedings of the 14th International Symposium on Parallel and Distributed Processing (IPDPS’00). IEEE Computer Society, Washington, DC, USA, 711.

[110]

Peter Noll. 1997. MPEG digital audio coding. IEEE Signal Processing Magazine 14, 5, 59--81.

[111]

NVIDIA. 2014. NVIDIA GTX 980 Whitepaper. Retrieved November 29, 2017 from https://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF.

[112]

Simon Ogg and Bashir Al-Hashimi. 2006. Improved data compression for serial interconnected network on chip through unused significant bit removal. In 19th International Conference on VLSI Design. Held jointly with 5th International Conference on Embedded Systems and Design. IEEE, 5 pp.

Digital Library

[113]

Soontorn Oraintara, Ying-Jui Chen, and Truong Q. Nguyen. 2002. Integer fast Fourier transform. IEEE Transactions on Signal Processing 50, 3, 607--618.

Digital Library

[114]

David J. Palframan, Nam Sung Kim, and Mikko H. Lipasti. 2014. Precision-aware soft error protection for GPUs. In IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE, 49--59.

[115]

J. Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In Hot Chips, Vol. 23.

[116]

Gennady Pekhimenko, Evgeny Bolotin, Mike O’Connor, Onur Mutlu, Todd C. Mowry, and Steve Keckler. 2015. Toggle-aware compression for GPUs. In IEEE Computer Architecture Letters. 14, 2 (2015), 164--168.

Digital Library

[117]

Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, and Stephen W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In IEEE International Symposium on High Performance Computer Architecture (HPCA’16). IEEE, 188--200.

[118]

Arthur Perais and André Seznec. 2014. EOLE: Paving the way for an effective implementation of value prediction. In ACM SIGARCH Computer Architecture News, Vol. 42. IEEE Press, 481--492.

Digital Library

[119]

Arthur Perais and André Seznec. 2014. Practical data value speculation for future high-end processors. In IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE, 428--439.

[120]

Arthur Perais and André Seznec. 2015. BeBoP: A cost effective predictor infrastructure for superscalar value prediction. In IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). IEEE, 13--25.

[121]

Calton Pu and Lenin Singaravelu. 2005. Fine-grain adaptive compression in dynamically variable networks. In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS’05). IEEE, 685--694.

Digital Library

[122]

Abbas Rahimi, Amirali Ghofrani, Kwang-Ting Cheng, Luca Benini, and Rajesh K. Gupta. 2015. Approximate associative memristive memory for energy-efficient GPUs. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition. EDA Consortium, 1497--1502.

[123]

Vara Ramakrishnan and Isaac D. Scherson. 1999. Efficient techniques for nested and disjoint barrier synchronization. Journal of Parallel and Distributed Computing 58, 2, 333--356.

Digital Library

[124]

Easwaran Raman, Ram Rangan, David I. August, and others. 2008. Spice: Speculative parallel iteration chunk execution. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 175--184.

Digital Library

[125]

Ashish Ranjan, Swagath Venkataramani, Xuanyao Fong, Kaushik Roy, and Anand Raghunathan. 2015. Approximate storage for energy efficient spintronic memories. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 195.

Digital Library

[126]

Lakshminarayanan Renganarayana, Vijayalakshmi Srinivasan, Ravi Nair, and Daniel Prener. 2012. Programming with relaxed synchronization. In Proceedings of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability. ACM, 41--50.

Digital Library

[127]

Martin Rinard. 2013. Parallel synchronization-free approximate data structure construction. In HotPar.

[128]

Martin C. Rinard. 2012. Unsynchronized techniques for approximate parallel computing. In RACES Workshop.

[129]

Antonio Roldao-Lopes, Amir Shahzad, George A. Constantinides, and Eric C. Kerrigan. 2009. More flops or more precision? Accuracy parameterizable linear equation solvers for model predictive control. In 17th IEEE Symposium on Field Programmable Custom Computing Machines (FCCM’09). IEEE, 209--216.

[130]

Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning assistant for floating-point precision. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 27.

Digital Library

[131]

David Salomon. 2004. Data Compression: The Complete Reference. Springer Science 8 Business Media, New York, NY.

Digital Library

[132]

Mehrzad Samadi, Davoud Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke. 2014. Paraprox: Pattern-based approximation for data parallel applications. In ACM SIGARCH Computer Architecture News, Vol. 42. ACM, 35--50.

[133]

Mehrzad Samadi, Janghaeng Lee, D. Anoushe Jamshidi, Amir Hormati, and Scott Mahlke. 2013. Sage: Self-tuning approximation for graphics engines. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 13--24.

Digital Library

[134]

Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. In ACM SIGPLAN Notices, Vol. 46. ACM, 164--174.

Digital Library

[135]

Adrian Sampson, Jacob Nelson, Karin Strauss, and Luis Ceze. 2014. Approximate storage in solid-state memories. ACM Transactions on Computer Systems 32, 3, 9.

Digital Library

[136]

Jack Sampson, Ruben Gonzalez, Jean-Francois Collard, Norman P. Jouppi, Mike Schlansker, and Brad Calder. 2006. Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, Washington, DC, USA, 235--246.

Digital Library

[137]

Joshua San Miguel and N. Enright Jerger. 2014. Load value approximation: Approaching the ideal memory access latency. In Workshop on Approximate Computing Across the System Stack.

[138]

J. Sartori and R. Kumar. 2010. Low-overhead, high-speed multi-core barrier synchronization. In HiPEAC. 18--34.

[139]

John Sartori and Rakesh Kumar. 2013. Branch and data herding: Reducing control and memory divergence for error-tolerant GPU applications. IEEE Transactions on Multimedia 15, 2, 279--290.

Digital Library

[140]

Vijay Sathish, Michael J. Schulte, and Nam Sung Kim. 2012. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 325--334.

Digital Library

[141]

Yiannakis Sazeides and James E. Smith. 1997. The predictability of data values. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 248--258.

[142]

Steven L. Scott. 1996. Synchronization and communication in the T3E multiprocessor. SIGOPS Operating Systems Review 30, 5, 26--36.

Digital Library

[143]

Marko Scrbak, Mahzabeen Islam, Krishna M. Kavi, Mike Ignatowski, and Nuwan Jayasena. 2015. Processing-in-memory: Exploring the design space. In Architecture of Computing Systems (ARCS’15). Springer, 43--54.

[144]

André Seznec. 2011. A new case for the TAGE branch predictor. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 117--127.

Digital Library

[145]

Ali Shafiee, Meysam Taassori, Rajeev Balasubramonian, and A. K. Davis. 2014. MemZip: Exploring unconventional benefits from memory compression. In IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE, 638--649.

[146]

Li Shang, Li-Shiuan Peh, and Niraj K. Jha. 2003. Dynamic voltage scaling with links for power optimization of interconnection networks. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA’03). IEEE, 91--102.

[147]

Shisheng Shang and Kai Hwang. 1995. Distributed hardwired barrier synchronization for scalable multiprocessor clusters. IEEE Transactions on Parallel Distributed Systems 6, 6, 591--605.

Digital Library

[148]

Majid Shoushtari, Abbas BanaiyanMofrad, and Nikil Dutt. 2015. Exploiting partially-forgetful memories for approximate computing. IEEE Embedded Systems Letters 7, 1, 19--22.

Digital Library

[149]

Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. ACM, 124--134.

Digital Library

[150]

María Soler and José Flich. 2013. Power saving by NoC traffic compression. In European Conference on Parallel Processing. Springer, 465--476.

[151]

Renée St. Amant, Amir Yazdanbakhsh, Jongse Park, Bradley Thwaites, Hadi Esmaeilzadeh, Arjang Hassibi, Luis Ceze, and Doug Burger. 2014. General-purpose code acceleration with limited-precision analog computation. ACM SIGARCH Computer Architecture News 42, 3, 505--516.

Digital Library

[152]

J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. 2002. Improving value communication for thread-level speculation. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 65--75.

[153]

Ayswarya Sundaram, Ameen Aakel, Derek Lockhart, Darshan Thaker, and Diana Franklin. 2008. Efficient fault tolerance in multi-media applications through selective instruction replication. In Proceedings of the 2008 Workshop on Radiation Effects and Fault Tolerance in Nanometer Technologies. ACM, 339--346.

Digital Library

[154]

Mark Sutherland, Joshua San Miguel, and Natalie Enright Jerger. 2015. Texture cache approximation on GPUs. In Workshop on Approximate Computing Across the Stack.

[155]

M. B. Taylor. 2012. Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse. In Design Automation Conference.

[156]

Bradley Thwaites, Gennady Pekhimenko, Hadi Esmaeilzadeh, Amir Yazdanbakhsh, Onur Mutlu, Jongse Park, Girish Mururu, and Todd Mowry. 2014. Rollback-free value prediction with approximate loads. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, 493--494.

Digital Library

[157]

Ye Tian, Qian Zhang, Ting Wang, Feng Yuan, and Qiang Xu. 2015. Approxma: Approximate memory access for dynamic precision scaling. In Proceedings of the 25th Edition on Great Lakes Symposium on VLSI. ACM, 337--342.

Digital Library

[158]

Vassilis Vassiliadis, Konstantinos Parasyris, Charalambos Chalios, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2015. A programming model and runtime system for significance-aware energy-efficient computing. In ACM SIGPLAN Notices, Vol. 50. ACM, 275--276.

[159]

Swagath Venkataramani, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2015. Approximate computing and the quest for computing efficiency. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 120.

Digital Library

[160]

Swagath Venkataramani, Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Quality programmable vector processors for approximate computing. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 1--12.

Digital Library

[161]

Swagath Venkataramani, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2014. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proceedings of the 2014 International Symposium on Low Power Electronics and Design. ACM, 27--32.

Digital Library

[162]

Swagath Venkataramani, Kaushik Roy, and Anand Raghunathan. 2013. Substitute-and-simplify: A unified design paradigm for approximate and quality configurable circuits. In Proceedings of the Conference on Design, Automation and Test in Europe. EDA Consortium, 1367--1372.

[163]

Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, and Onur Mutlu. 2015. A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 41--53.

Digital Library

[164]

Oreste Villa, Gianluca Palermo, and Cristina Silvano. 2008. Efficiency and scalability of barrier synchronization on NoC based many-core architectures. In CASES’08: Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, New York, NY, USA, 81--90.

Digital Library

[165]

Gregory K. Wallace. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1, xviii--xxxiv.

Digital Library

[166]

Kai Wang and Manoj Franklin. 1997. Highly accurate data value prediction using hybrid predictors. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 281--290.

Digital Library

[167]

Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In Conference Record of the 37th Asilomar Conference on Signals, Systems and Computers, Vol. 2. IEEE, 1398--1402.

[168]

Terry A. Welch. 1984. A technique for high-performance data compression. Computer 6, 17, 8--19.

Digital Library

[169]

Benjamin Welton, Dries Kimpe, Jason Cope, Christina M. Patrick, Kamil Iskra, and Robert Ross. 2011. Improving I/O forwarding throughput with data compression. In IEEE International Conference on Cluster Computing (CLUSTER’11). IEEE, 438--445.

Digital Library

[170]

Yair Wiseman, Karsten Schwan, and Patrick Widener. 2005. Efficient end to end data exchange using configurable compression. ACM SIGOPS Operating Systems Review 39, 3, 4--23.

Digital Library

[171]

Qiang Xu, Todd Mytkowicz, and Nam Sung Kim. 2016. Approximate computing: A survey. IEEE Design 8 Test 33, 1, 8--22.

[172]

Xin Xu and H. Howie Huang. 2015. Exploring data-level error tolerance in high-performance solid-state drives. IEEE Transactions on Reliability 64, 1, 15--30.

[173]

Amir Yazdanbakhsh, Gennady Pekhimenko, Bradley Thwaites, Hadi Esmaeilzadeh, Onur Mutlu, and Todd C. Mowry. 2016. RFVP: Rollback-free value prediction with safe-to-approximate loads. ACM Transactions on Architecture and Code Optimization 12, 4, 62.

Digital Library

[174]

Rong Ye, Ting Wang, Feng Yuan, Rakesh Kumar, and Qiang Xu. 2013. On reconfiguration-oriented approximate adder design and its application. In Proceedings of the International Conference on Computer-Aided Design. IEEE Press, 48--54.

[175]

Yavuz Yetim, Sharad Malik, and Margaret Martonosi. 2015. CommGuard: Mitigating communication errors in error-prone parallel execution. In ACM SIGPLAN Notices, Vol. 50. ACM, 311--323.

Digital Library

[176]

Yavuz Yetim, Margaret Martonosi, and Sharad Malik. 2013. Extracting useful computation from error-prone processors for streaming applications. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’13). IEEE, 202--207.

[177]

Felix Zahn, Steffen Lammel, and Holger Fröning. 2017. Early experiences with saving energy in direct interconnection networks. In IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB’17). IEEE, 33--40.

[178]

Felix Zahn, Pedro Yebenes, Steffen Lammel, Pedro J. Garcia, and Holger Fröning. 2016. Analyzing the energy (dis-)proportionality of scalable interconnection networks. In 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB’16). IEEE, 25--32.

[179]

Hang Zhang, Mateja Putic, and John Lach. 2014. Low power gpgpu computation with imprecise hardware. In Proceedings of the 51st Annual Design Automation Conference. ACM, 1--6.

Digital Library

[180]

Qian Zhang, Ting Wang, Ye Tian, Feng Yuan, and Qiang Xu. 2015. ApproxANN: An approximate computing framework for artificial neural network. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition. EDA Consortium, 701--706.

Digital Library

[181]

Huiyang Zhou and Thomas M. Conte. 2005. Enhancing memory-level parallelism via recovery-free value prediction. IEEE Transactions on Computers 54, 7, 897--912.

Digital Library

[182]

Qiuling Zhu, Bilal Akin, H. Ekin Sumbul, Fazle Sadi, James C. Hoe, Larry Pileggi, and Franz Franchetti. 2013. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In IEEE International 3D Systems Integration Conference (3DIC’13). IEEE, 1--7.

[183]

Weirong Zhu, Vugranam C. Sreedhar, Ziang Hu, and Guang R. Gao. 2007. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, NY, USA, 35--45.

Cited By

Reza MYeazel A(2024)Mapping Model and Heuristics for Accelerating Deep Neural Networks and for Energy-Efficient Networks-on-ChipSoutheastCon 202410.1109/SoutheastCon52093.2024.10500232(119-126)Online publication date: 15-Mar-2024
https://doi.org/10.1109/SoutheastCon52093.2024.10500232
Damsgaard HGrenier AKatare DTaufique ZShakibhamedan STroccoli TChatzitsompanis GKanduri AOmetov ADing ATaherinejad NKarakonstantis GWoods RNurmi J(2024)Adaptive approximate computing in edge AI and IoT applications: A reviewJournal of Systems Architecture10.1016/j.sysarc.2024.103114150(103114)Online publication date: May-2024
https://doi.org/10.1016/j.sysarc.2024.103114
Liu DRen XWu JLiu WZhao JPeng S(2024)Pipe-AGCM: A Fine-Grain Pipelining Scheme for Optimizing the Parallel Atmospheric General Circulation ModelEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_20(283-297)Online publication date: 26-Aug-2024
https://doi.org/10.1007/978-3-031-69583-4_20
Show More Cited By

Index Terms

Approximate Communication: Techniques for Reducing Communication Bottlenecks in Large-Scale Parallel Systems
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

Approximate Communication Strategies for Energy-Efficient and High Performance NoC: Opportunities and Challenges
GLSVLSI '19: Proceedings of the 2019 Great Lakes Symposium on VLSI

With the advancement and miniaturization of transistor technology, hundreds of cores can be integrated on a single chip. Network-on-Chips (NoCs) are the de facto on-chip communication fabrics for multi/many core systems because of their benefits over the ...
An online quality management framework for approximate communication in network-on-chips
ICS '19: Proceedings of the ACM International Conference on Supercomputing

Approximate communication is being seriously considered as an effective technique for reducing power consumption and improving the communication efficiency of network-on-chips (NoCs). A major problem faced by these techniques is quality control: how do ...
Energy efficient 3D network-on-chip based on approximate communication
Abstract
Technology advancement and integration of many cores into a chip lead to high-performance parallel architectures in computing systems. Three-dimensional Network-on-Chips (3D NoCs) have been adopted as a promising architecture in the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 51, Issue 1

January 2019

743 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3177787

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2018

Accepted: 01 September 2017

Revised: 01 July 2017

Received: 01 July 2016

Published in CSUR Volume 51, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

53
Total Citations
View Citations
1,263
Total Downloads

Downloads (Last 12 months)102
Downloads (Last 6 weeks)4

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Reza MYeazel A(2024)Mapping Model and Heuristics for Accelerating Deep Neural Networks and for Energy-Efficient Networks-on-ChipSoutheastCon 202410.1109/SoutheastCon52093.2024.10500232(119-126)Online publication date: 15-Mar-2024
https://doi.org/10.1109/SoutheastCon52093.2024.10500232
Damsgaard HGrenier AKatare DTaufique ZShakibhamedan STroccoli TChatzitsompanis GKanduri AOmetov ADing ATaherinejad NKarakonstantis GWoods RNurmi J(2024)Adaptive approximate computing in edge AI and IoT applications: A reviewJournal of Systems Architecture10.1016/j.sysarc.2024.103114150(103114)Online publication date: May-2024
https://doi.org/10.1016/j.sysarc.2024.103114
Liu DRen XWu JLiu WZhao JPeng S(2024)Pipe-AGCM: A Fine-Grain Pipelining Scheme for Optimizing the Parallel Atmospheric General Circulation ModelEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_20(283-297)Online publication date: 26-Aug-2024
https://doi.org/10.1007/978-3-031-69583-4_20
NIWA NSHIKAMA YAMANO HKOIBUCHI M(2023)A Compression Router for Low-Latency Network-on-ChipIEICE Transactions on Information and Systems10.1587/transinf.2022EDP7080E106.D:2(170-180)Online publication date: 1-Feb-2023
https://doi.org/10.1587/transinf.2022EDP7080
Bhattacharjee AMoitra APanda P(2023) XploreNAS: Explore Adversarially Robust and Hardware-efficient Neural Architectures for Non-ideal XbarsACM Transactions on Embedded Computing Systems10.1145/359304522:4(1-17)Online publication date: 24-Jul-2023
https://dl.acm.org/doi/10.1145/3593045
Lin BLi YGui NXu ZYu Z(2023)Multi-view Graph Representation Learning Beyond HomophilyACM Transactions on Knowledge Discovery from Data10.1145/359285817:8(1-21)Online publication date: 28-Jun-2023
https://dl.acm.org/doi/10.1145/3592858
Tajeuna EBouguessa MWang S(2023)Modeling Regime Shifts in Multiple Time SeriesACM Transactions on Knowledge Discovery from Data10.1145/359285717:8(1-31)Online publication date: 28-Jun-2023
https://dl.acm.org/doi/10.1145/3592857
Wu XDe Pellegrini FCasale G(2023)Delay and Price Differentiation in Cloud Computing: A Service Model, Supporting Architectures, and PerformanceACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35928528:3(1-40)Online publication date: 24-Jun-2023
https://dl.acm.org/doi/10.1145/3592852
Al Mallah RHalabi TFarooq B(2023)Resilience-by-design in Adaptive Multi-agent Traffic Control SystemsACM Transactions on Privacy and Security10.1145/359279926:3(1-27)Online publication date: 26-Jun-2023
https://dl.acm.org/doi/10.1145/3592799
Reza M(2023)Machine Learning Enabled Solutions for Design and Optimization Challenges in Networks-on-Chip based Multi/Many-Core ArchitecturesACM Journal on Emerging Technologies in Computing Systems10.1145/359147019:3(1-26)Online publication date: 30-Jun-2023
https://dl.acm.org/doi/10.1145/3591470
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents