Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Approximate Communication: Techniques for Reducing Communication Bottlenecks in Large-Scale Parallel Systems

Published: 10 January 2018 Publication History

Abstract

Approximate computing has gained research attention recently as a way to increase energy efficiency and/or performance by exploiting some applications’ intrinsic error resiliency. However, little attention has been given to its potential for tackling the communication bottleneck that remains one of the looming challenges to be tackled for efficient parallelism. This article explores the potential benefits of approximate computing for communication reduction by surveying three promising techniques for approximate communication: compression, relaxed synchronization, and value prediction. The techniques are compared based on an evaluation framework composed of communication cost reduction, performance, energy reduction, applicability, overheads, and output degradation. Comparison results demonstrate that lossy link compression and approximate value prediction show great promise for reducing the communication bottleneck in bandwidth-constrained applications. Meanwhile, relaxed synchronization is found to provide large speedups for select error-tolerant applications, but suffers from limited general applicability and unreliable output degradation guarantees. Finally, this article concludes with several suggestions for future research on approximate communication techniques.

References

[1]
Tor M. Aamodt and Paul Chow. 2008. Compile-time and instruction-set methods for improving floating-to fixed-point conversion accuracy. ACM Transactions on Embedded Computing Systems 7, 3, 26.
[2]
Bülent Abali, Hubertus Franke, Dan E. Poff, Robert A. Saccone, Jr., Charles O. Schulz, Lorraine M. Herger, and T. Basil Smith. 2001. Memory expansion technology (MXT): software support and performance. IBM Journal of Research and Development 45, 2, 287--301.
[3]
Don Adams. 1993. CRAY T3D System Architecture Overview Manual. Retrieved November 29, 2017 from ftp://ftp.cray.com/product-info/mpp/T3D_Architecture_Over/T3D.overview.html.
[4]
Ismail Akturk, Karen Khatamifard, and Ulya R. Karpuzcu. 2015. On quantification of accuracy loss in approximate computing. In Workshop on Duplicating, Deconstructing and Debunking (WDDD’15). 15.
[5]
Alaa R. Alameldeen and David A. Wood. 2004. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture. IEEE, 212--223.
[6]
Alaa R. Alameldeen and David A. Wood. 2007. Interactions between compression and prefetching in chip multiprocessors. In IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). IEEE, 228--239.
[7]
George Almási, Philip Heidelberger, Charles J. Archer, Xavier Martorell, C. Chris Erway, José E. Moreira, B. Steinmacher-Burow, and Yili Zheng. 2005. Optimization of MPI collective communication on BlueGene/L systems. In Proceedings of the 19th Annual International Conference on Supercomputing (ICS’05). ACM, New York, NY, 253--262.
[8]
Carlos Alvarez, Jesus Corbal, and Mateo Valero. 2005. Fuzzy memoization for floating-point multimedia applications. IEEE Transactions on Computers 54, 7, 922--927.
[9]
Gene M. Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, Spring Joint Computer Conference. ACM, 483--485.
[10]
Baik Song An, Manhee Lee, Ki Hwan Yum, and Eun Jung Kim. 2012. Efficient data packet compression for cache coherent multiprocessor systems. In Data Compression Conference (DCC’12). IEEE, 129--138.
[11]
Mohammad Ashraful Anam, Paul N. Whatmough, and Yiannis Andreopoulos. 2013. Precision-energy-throughput scaling of generic matrix multiplication and discrete convolution kernels via linear projections. In IEEE 11th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia’13). IEEE, 21--30.
[12]
Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A Language and Compiler for Algorithmic Choice. Vol. 44. ACM.
[13]
Jason Ansel, Yee Lok Wong, Cy Chan, Marek Olszewski, Alan Edelman, and Saman Amarasinghe. 2011. Language and compiler support for auto-tuning variable-accuracy algorithms. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 85--96.
[14]
Woongki Baek and Trishul M. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In ACM SIGPLAN Notices, Vol. 45. ACM, 198--209.
[15]
Arnab Banerjee, Pascal T. Wolkotte, Robert D. Mullins, Simon W. Moore, and Gerard J. M. Smit. 2009. An energy and performance exploration of network-on-chip architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 3, 319--329.
[16]
Carl J. Beckmann and Constantine D. Polychronopoulos. 1990. Fast barrier synchronization hardware. In Proceedings of the 1990 ACM/IEEE Conference on Supercomputing (Supercomputing’90). IEEE Computer Society, Washington, DC, USA, 180--189.
[17]
K. Bergman and others. 2008. Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep 15 (2008).
[18]
Tekin Bicer, Jian Yin, Dereck Chiu, Gagan Agrawal, and Karen Schuchardt. 2013. Integrating online compression to accelerate large-scale data analytics applications. In IEEE 27th International Symposium on Parallel 8 Distributed Processing (IPDPS’13). IEEE, 1205--1216.
[19]
Mark Buckler, Wayne Burleson, and Greg Sadowski. 2013. Low-power networks-on-chip: Progress and remaining challenges. In 2013 IEEE International Symposium on Low Power Electronics and Design (ISLPED’13). IEEE, 132--134.
[20]
Huy Bui, Hal Finkel, Venkatram Vishwanath, Salman Habib, Katrin Heitmann, Jason Leigh, Michael Papka, and Kevin Harms. 2014. Scalable parallel I/O on a blue gene/Q supercomputer using compression, topology-aware data aggregation, and subfiling. In 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP’14). IEEE, 107--111.
[21]
Surendra Byna, Jiayuan Meng, Anand Raghunathan, Srimat Chakradhar, and Srihari Cadambi. 2010. Best-effort semantic document search on GPUs. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM, 86--93.
[22]
Brad Calder, Glenn Reinman, and Dean M. Tullsen. 1999. Selective value prediction. In Proceedings of the 26th International Symposium on Computer Architecture. IEEE, 64--74.
[23]
Ramon Canal, Antonio González, and James E. Smith. 2000. Very low power pipelines using significance compression. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture. ACM, 181--190.
[24]
Vito Cappellini. 1985. Data Compression and Error Control Techniques with Applications. Academic Press, Inc., Cambridge, MA.
[25]
Michael Carbin, Sasa Misailovic, and Martin C. Rinard. 2013. Verifying quantitative reliability for programs that execute on unreliable hardware. In ACM SIGPLAN Notices, Vol. 48. ACM, 33--52.
[26]
Srimat T. Chakradhar and Anand Raghunathan. 2010. Best-effort computing: Re-thinking parallel software and hardware. In 47th ACM/IEEE Design Automation Conference (DAC’10). IEEE, 865--870.
[27]
Jie Chen and W. Watson. 2008. Software barrier performance on dual quad-core opterons. International Conference on Networking, Architecture, and Storage, 2008 (NAS’08). 303--309.
[28]
Yen-Kuang Chen, Jatin Chhugani, Pradeep Dubey, Christopher J. Hughes, Daehyun Kim, Sanjeev Kumar, Victor W. Lee, Anthony D. Nguyen, and Mikhail Smelyanskiy. 2008. Convergence of recognition, mining, and synthesis workloads and its implications. Proceedings of IEEE 96, 5, 790--807.
[29]
Vinay K. Chippa, Hrishikesh Jayakumar, Debabrata Mohapatra, Kaushik Roy, and Anand Raghunathan. 2013. Energy-efficient recognition and mining processor using scalable effort design. In IEEE Custom Integrated Circuits Conference (CICC’13). IEEE, 1--4.
[30]
Vinay Kumar Chippa, Debabrata Mohapatra, Kaushik Roy, Srimat T. Chakradhar, and Anand Raghunathan. 2014. Scalable effort hardware design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 9, 2004--2016.
[31]
Vinay K. Chippa, Swagath Venkataramani, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Approximate computing: An integrated hardware approach. In Asilomar Conference on Signals, Systems and Computers. IEEE, 111--117.
[32]
Vinay K. Chippa, Swagath Venkataramani, Kaushik Roy, and Anand Raghunathan. 2014. StoRM: A stochastic recognition and mining processor. In Proceedings of the 2014 International Symposium on Low Power Electronics and Design. ACM, 39--44.
[33]
Kyungsang Cho, Yongjun Lee, Young H. Oh, Gyoo-cheol Hwang, and Jae W. Lee. 2014. eDRAM-based tiered-reliability memory with applications to low-power frame buffers. In IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’14). IEEE, 333--338.
[34]
Marcelo Cintra and Josep Torrellas. 2002. Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 43--54.
[35]
R. J. Cintra. 2011. An integer approximation method for discrete sinusoidal transforms. Circuits, Systems, and Signal Processing 30, 6, 1481--1501.
[36]
Renato J. Cintra and Fábio M. Bayer. 2011. A DCT approximation for image compression. IEEE Signal Processing Letters 18, 10, 579--582.
[37]
Daniel Citron and Larry Rudolph. 1995. Creating a wider bus using caching techniques. In Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture. IEEE, 90--99.
[38]
Paul Coteus, H. Randall Bickford, Thomas M. Cipolla, Paul Crumley, Alan Gara, Shawn Hall, Gerard V. Kopcsay, Alphonso P. Lanzetta, Lawrence S. Mok, Rick A. Rand, Richard A. Swetz, Todd Takken, Paul La Rocca, Christopher Marroquin, Philip R. Germann, and Mark J. Jeanson. 2005. Packaging the blue gene/L supercomputer. IBM Journal of Research and Development 49, 2--3, 213--248.
[39]
David E. Culler, Jaswinder Pal Singh, and Anoop Gupta. 1999. Parallel Computer Architecture: A Hardware/Software Approach. Gulf Professional Publishing, Houston, TX.
[40]
William J. Dally and Brian Towles. 2001. Route packets, not wires: On-chip interconnection networks. In Proceedings of the Design Automation Conference. IEEE, 684--689.
[41]
Reetuparna Das, Asit K. Mishra, Chrysostomos Nicopoulos, Dongkook Park, Vijaykrishnan Narayanan, Ravishankar Iyer, Mazin S. Yousif, and Chita R. Das. 2008. Performance and power optimization through data compression in network-on-chip architectures. In IEEE 14th International Symposium on High Performance Computer Architecture (HPCA’08). IEEE, 215--225.
[42]
Marc De Kruijf, Shuou Nomura, and Karthikeyan Sankaralingam. 2010. Relax: An architectural framework for software recovery of hardware faults. In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 497--508.
[43]
Li Deng and Douglas O’Shaughnessy. 2003. Speech Processing: A Dynamic and Optimization-Oriented Approach. CRC Press, Boca Raton, FL.
[44]
J. Dongarra, P. Luszczek, and A. Petitet. 2003. The LINPACK benchmark: Past, present, and future. Concurrency and Computation: Practice and Experience 15, 9, 803--820.
[45]
Zidong Du, Avinash Lingamneni, Yunji Chen, Krishna Palem, Olivier Temam, and Chengyong Wu. 2014. Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators. In 19th Asia and South Pacific Design Automation Conference (ASP-DAC’14). IEEE, 201--206.
[46]
Peter Düben, Jeremy Schlachter, Sreelatha Yenugula, John Augustine, Christian Enz, K. Palem, T. N. Palmer, and others. 2015. Opportunities for energy efficient computing: A study of inexact general purpose processors for high-performance and big-data applications. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition. EDA Consortium, 764--769.
[47]
Pradeep Dubey. 2005. Recognition, mining and synthesis moves computers to the era of Tera. Technology@ Intel Magazine 9, 2, 1--10.
[48]
Peter Elias. 1955. Predictive coding--I. IRE Transactions on Information Theory 1, 1, 16--24.
[49]
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In International Symposium on Computer Architecture.
[50]
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Architecture support for disciplined approximate programming. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems.
[51]
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 449--460.
[52]
Marius Evers, Po-Yung Chang, and Yale N. Patt. 1996. Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches. In ACM SIGARCH Computer Architecture News, Vol. 24. ACM, 3--11.
[53]
Yuntan Fang, Huawei Li, and Xiaowei Li. 2012. SoftPCM: Enhancing energy efficiency and lifetime of phase change memory in video applications via approximate write. In IEEE 21st Asian Test Symposium (ATS’12). IEEE, 131--136.
[54]
Eric Freudenthal and Olivier Peze. 1988. Efficient Synchronization Algorithms Using Fetch-and-Add on Multiple Bitfield Integers. Ultracomputer Note 148.
[55]
Shrikanth Ganapathy, Georgios Karakonstantis, Adam Teman, and Andreas Burg. 2015. Mitigating the impact of faults in unreliable memories for error-resilient applications. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 102.
[56]
Bart Goeman, Hans Vandierendonck, and Koen De Bosschere. 2001. Differential FCM: Increasing value prediction accuracy by improving table usage efficiency. In 7th International Symposium on High-Performance Computer Architecture (HPCA’01). IEEE, 207--216.
[57]
Inigo Goiri, Ricardo Bianchini, Santosh Nagarakatte, and Thu D. Nguyen. 2015. Approxhadoop: Bringing approximations to mapreduce frameworks. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 383--397.
[58]
Jill R. Goldschneider. 1997. Lossy Compression of Scientific Data Via Wavelets and Vector Quantization. Ph.D. thesis, University of Washington, Seattle, WA. https://digital.lib.washington.edu/researchworks/handle/1773/5881?show=full.
[59]
Beayna Grigorian and Glenn Reinman. 2015. Accelerating divergent applications on SIMD architectures using neural networks. ACM Transactions on Architecture and Code Optimization 12, 1, 2.
[60]
Vaibhav Gupta, Debabrata Mohapatra, Sang Phill Park, Anand Raghunathan, and Kaushik Roy. 2011. IMPACT: Imprecise adders for low-power approximate computing. In Proceedings of the 17th IEEE/ACM International Symposium on Low-power Electronics and Design. IEEE Press, 409--414.
[61]
Erik G. Hallnor and Steven K. Reinhardt. 2004. A compressed memory hierarchy using an indirect index cache. In Proceedings of the 3rd Workshop on Memory Performance Issues: In Conjunction with the 31st International Symposium on Computer Architecture. ACM, 9--15.
[62]
Maurice Herlihy, J. Eliot, and B. Moss. 1993. Transactional Memory: Architectural Support for Lock-Free Data Structures. Vol. 21. ACM.
[63]
T. Hoefler, T. Mehlan, F. Mietke, and W. Rehm. 2004. A survey of barrier algorithms for coarse grained supercomputers. Chemnitzer Informatik Berichte 4, 3 (2004).
[64]
Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant Agarwal, and Martin Rinard. 2009. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Technical Report MIT-CSAIL-TR-2209-037, EECS, MIT.
[65]
Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard. 2011. Dynamic knobs for responsive power-aware computing. In ACM SIGPLAN Notices, Vol. 46. ACM, 199--212.
[66]
Chih-Chieh Hsiao, Slo-Li Chu, and Chen-Yu Chen. 2013. Energy-aware hybrid precision selection framework for mobile GPUs. Computers 8 Graphics 37, 5, 431--444.
[67]
Jiawei Huang, John Lach, and Gabriel Robins. 2012. A methodology for energy-quality tradeoff using imprecise hardware. In Proceedings of the 49th Annual Design Automation Conference. ACM, 504--509.
[68]
Jeremy Iverson, Chandrika Kamath, and George Karypis. 2012. Fast and effective lossy compression algorithms for scientific datasets. In Euro-Par 2012 Parallel Processing. Springer, 843--856.
[69]
Yuho Jin, Ki Hwan Yum, and Eun Jung Kim. 2008. Adaptive data compression for high-performance low-power on-chip networks. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 354--363.
[70]
Andrew B. Kahng and Seokhyeong Kang. 2012. Accuracy-configurable adder for approximate arithmetic designs. In Proceedings of the 49th Annual Design Automation Conference. ACM, 820--825.
[71]
Georgios Karakonstantis, Debabrata Mohapatra, and Kaushik Roy. 2012. Logic and memory design based on unequal error protection for voltage-scalable, robust and adaptive DSP systems. Journal of Signal Processing Systems 68, 3, 415--431.
[72]
Georgios Keramidas, Chrysa Kokkala, and Iakovos Stamoulis. 2015. Clumsy value cache: An approximate memoization technique for mobile GPU fragment shaders. In Workshop on Approximate Computing (WAPCO’15).
[73]
Daya Shanker Khudia and Scott Mahlke. 2014. Harnessing soft computations for low-budget fault tolerance. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE, 319--330.
[74]
Hyungjun Kim, Pritha Ghoshal, Boris Grot, Paul V. Gratz, and Daniel A. Jiménez. 2011. Reducing network-on-chip energy consumption through spatial locality speculation. In Proceedings of the 5th ACM/IEEE International Symposium on Networks-on-Chip. ACM, 233--240.
[75]
Chandra Krintz and Sezgin Sucu. 2006. Adaptive on-the-fly compression. IEEE Transactions on Parallel and Distributed Systems 17, 1, 15--24.
[76]
Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In 24th International Conference on VLSI Design (VLSI Design’11). IEEE, 346--351.
[77]
Didier Le Gall. 1991. MPEG: A video compression standard for multimedia applications. Communications of the ACM 34, 4, 46--58.
[78]
Jae Bum Lee and Chu Shik Jhon. 1998. Reducing coherence overhead of barrier synchronization in software DSMs. In Supercomputing’98: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM). IEEE Computer Society, Washington, DC, USA, 1--18.
[79]
Jang-Soo Lee, Won-Kee Hong, and Shin-Dug Kim. 1999. Design and evaluation of a selective compressed memory system. In International Conference on Computer Design (ICCD’99). IEEE, 184--191.
[80]
Kangmin Lee, Se-Joong Lee, and Hoi-Jun Yoo. 2006. Low-power network-on-chip for high-performance SoC design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 14, 2, 148--160.
[81]
Moon-Sang Lee, Young-Jae Kang, Joon-Won Lee, and Seung-Ryoul Maeng. 2002. OPTS: Increasing branch prediction accuracy under context switch. Microprocessors and Microsystems 26, 6, 291--300.
[82]
Sungju Lee, Heegon Kim, Yongwha Chung, and Daihee Park. 2012. Energy efficient image/video data transmission on commercial multi-core processors. Sensors 12, 11, 14647--14670.
[83]
Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Won Woo Ro, and Murali Annavaram. 2015. Warped-compression: Enabling power efficient GPUs through register compression. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. ACM, 502--514.
[84]
Larkhoon Leem, Hyungmin Cho, Jason Bau, Quinn A. Jacobson, and Subhasish Mitra. 2010. ERSA: Error resilient system architecture for probabilistic applications. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’10). IEEE, 1560--1565.
[85]
Debra A. Lelewer and Daniel S. Hirschberg. 1987. Data compression. ACM Computing Surveys 19, 3, 261--296.
[86]
Krisda Lengwehasatit and Antonio Ortega. 2004. Scalable variable complexity approximate forward DCT. IEEE Transactions on Circuits and Systems for Video Technology 14, 11, 1236--1248.
[87]
Mikko H. Lipasti and John Paul Shen. 1996. Exceeding the dataflow limit via value prediction. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 226--237.
[88]
Mikko H. Lipasti, Christopher B. Wilkerson, and John Paul Shen. 1996. Value locality and load value prediction. ACM SIGOPS Operating Systems Review 30, 5, 138--147.
[89]
Shaoshan Liu, Christine Eisenbeis, and Jean-Luc Gaudiot. 2010. A theoretical framework for value prediction in parallel systems. In 39th International Conference on Parallel Processing (ICPP’10). IEEE, 11--20.
[90]
Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn. 2009. Flicker: Saving refresh-power in mobile devices through critical data partitioning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09).
[91]
Gabriel H. Loh, Nuwan Jayasena, M. Oskin, Mark Nutter, David Roberts, Mitesh Meswani, Dong Ping Zhang, and Mike Ignatowski. 2013. A processing in memory taxonomy and a case for studying fixed-function pim. In Workshop on Near-Data Processing (WoNDP’13).
[92]
Enrico Magli and Gabriella Olmo. 2003. Lossy predictive coding of SAR raw data. IEEE Transactions on Geoscience and Remote Sensing 41, 5, 977--987.
[93]
Milo M. K. Martin, Daniel J. Sorin, Harold W. Cain, Mark D. Hill, and Mikko H. Lipasti. 2001. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 328--337.
[94]
John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems 9, 1, 21--65.
[95]
Jiayuan Meng, Srimat Chakradhar, and Anand Raghunathan. 2009. Best-effort parallel execution framework for recognition and mining applications. In IEEE International Symposium on Parallel 8 Distributed Processing (IPDPS’09). IEEE, 1--12.
[96]
Jiayuan Mengte, Anand Raghunathan, Srimat Chakradhar, and Surendra Byna. 2010. Exploiting the forgiving nature of applications for scalable parallel execution. IEEE International Symposium on Parallel 8 Distributed Processing (IPDPS). IEEE.
[97]
Joshua San Miguel, Mario Badr, and Natalie Enright Jerger. 2014. Load value approximation. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 127--139.
[98]
Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C. Rinard. 2014. Chisel: Reliability-and accuracy-aware optimization of approximate computational kernels. In ACM SIGPLAN Notices, Vol. 49. ACM, 309--328.
[99]
Sasa Misailovic, Deokhwan Kim, and Martin Rinard. 2013. Parallelizing sequential programs with statistical accuracy tests. ACM Transactions on Embedded Computing Systems 12, 2s, 88.
[100]
Sasa Misailovic, Stelios Sidiroglou, Henry Hoffmann, and Martin Rinard. 2010. Quality of service profiling. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. ACM, 25--34.
[101]
Sasa Misailovic, Stelios Sidiroglou, and Martin C. Rinard. 2012. Dancing with uncertainty. In Proceedings of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability. ACM, 51--60.
[102]
Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys 48, 4, 62.
[103]
Debabrata Mohapatra, Vinay K. Chippa, Anand Raghunathan, and Kaushik Roy. 2011. Design of voltage-scalable meta-functions for approximate computing. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’11). IEEE, 1--6.
[104]
Debabrata Mohapatra, Georgios Karakonstantis, and Kaushik Roy. 2009. Significance driven computation: A voltage-scalable, variation-aware, quality-tuning motion estimator. In Proceedings of the 2009 ACM/IEEE International Symposium on Low Power Electronics and Design. ACM, 195--200.
[105]
Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, and Mark Oskin. 2015. SNNAP: Approximate computing on programmable socs via neural acceleration. In IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). IEEE, 603--614.
[106]
Michel Mouly, Marie-Bernadette Pautet, and Thomas Foreword By-Haug. 1992. The GSM System for Mobile Communications. Telecom Publishing.
[107]
Tarun Nakra, Rajiv Gupta, and Mary Lou Soffa. 1999. Global context-based value prediction. In Proceedings of the 5th International Symposium on High-Performance Computer Architecture. IEEE, 4--12.
[108]
Sriram Narayanan, John Sartori, Rakesh Kumar, and Douglas L. Jones. 2010. Scalable stochastic processors. In Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, 335--338.
[109]
D. Nikolopoulos and T. Papatheodorou. 2000. Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives. In Proceedings of the 14th International Symposium on Parallel and Distributed Processing (IPDPS’00). IEEE Computer Society, Washington, DC, USA, 711.
[110]
Peter Noll. 1997. MPEG digital audio coding. IEEE Signal Processing Magazine 14, 5, 59--81.
[111]
NVIDIA. 2014. NVIDIA GTX 980 Whitepaper. Retrieved November 29, 2017 from https://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF.
[112]
Simon Ogg and Bashir Al-Hashimi. 2006. Improved data compression for serial interconnected network on chip through unused significant bit removal. In 19th International Conference on VLSI Design. Held jointly with 5th International Conference on Embedded Systems and Design. IEEE, 5 pp.
[113]
Soontorn Oraintara, Ying-Jui Chen, and Truong Q. Nguyen. 2002. Integer fast Fourier transform. IEEE Transactions on Signal Processing 50, 3, 607--618.
[114]
David J. Palframan, Nam Sung Kim, and Mikko H. Lipasti. 2014. Precision-aware soft error protection for GPUs. In IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE, 49--59.
[115]
J. Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In Hot Chips, Vol. 23.
[116]
Gennady Pekhimenko, Evgeny Bolotin, Mike O’Connor, Onur Mutlu, Todd C. Mowry, and Steve Keckler. 2015. Toggle-aware compression for GPUs. In IEEE Computer Architecture Letters. 14, 2 (2015), 164--168.
[117]
Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, and Stephen W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In IEEE International Symposium on High Performance Computer Architecture (HPCA’16). IEEE, 188--200.
[118]
Arthur Perais and André Seznec. 2014. EOLE: Paving the way for an effective implementation of value prediction. In ACM SIGARCH Computer Architecture News, Vol. 42. IEEE Press, 481--492.
[119]
Arthur Perais and André Seznec. 2014. Practical data value speculation for future high-end processors. In IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE, 428--439.
[120]
Arthur Perais and André Seznec. 2015. BeBoP: A cost effective predictor infrastructure for superscalar value prediction. In IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). IEEE, 13--25.
[121]
Calton Pu and Lenin Singaravelu. 2005. Fine-grain adaptive compression in dynamically variable networks. In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS’05). IEEE, 685--694.
[122]
Abbas Rahimi, Amirali Ghofrani, Kwang-Ting Cheng, Luca Benini, and Rajesh K. Gupta. 2015. Approximate associative memristive memory for energy-efficient GPUs. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition. EDA Consortium, 1497--1502.
[123]
Vara Ramakrishnan and Isaac D. Scherson. 1999. Efficient techniques for nested and disjoint barrier synchronization. Journal of Parallel and Distributed Computing 58, 2, 333--356.
[124]
Easwaran Raman, Ram Rangan, David I. August, and others. 2008. Spice: Speculative parallel iteration chunk execution. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 175--184.
[125]
Ashish Ranjan, Swagath Venkataramani, Xuanyao Fong, Kaushik Roy, and Anand Raghunathan. 2015. Approximate storage for energy efficient spintronic memories. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 195.
[126]
Lakshminarayanan Renganarayana, Vijayalakshmi Srinivasan, Ravi Nair, and Daniel Prener. 2012. Programming with relaxed synchronization. In Proceedings of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability. ACM, 41--50.
[127]
Martin Rinard. 2013. Parallel synchronization-free approximate data structure construction. In HotPar.
[128]
Martin C. Rinard. 2012. Unsynchronized techniques for approximate parallel computing. In RACES Workshop.
[129]
Antonio Roldao-Lopes, Amir Shahzad, George A. Constantinides, and Eric C. Kerrigan. 2009. More flops or more precision? Accuracy parameterizable linear equation solvers for model predictive control. In 17th IEEE Symposium on Field Programmable Custom Computing Machines (FCCM’09). IEEE, 209--216.
[130]
Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning assistant for floating-point precision. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 27.
[131]
David Salomon. 2004. Data Compression: The Complete Reference. Springer Science 8 Business Media, New York, NY.
[132]
Mehrzad Samadi, Davoud Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke. 2014. Paraprox: Pattern-based approximation for data parallel applications. In ACM SIGARCH Computer Architecture News, Vol. 42. ACM, 35--50.
[133]
Mehrzad Samadi, Janghaeng Lee, D. Anoushe Jamshidi, Amir Hormati, and Scott Mahlke. 2013. Sage: Self-tuning approximation for graphics engines. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 13--24.
[134]
Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. In ACM SIGPLAN Notices, Vol. 46. ACM, 164--174.
[135]
Adrian Sampson, Jacob Nelson, Karin Strauss, and Luis Ceze. 2014. Approximate storage in solid-state memories. ACM Transactions on Computer Systems 32, 3, 9.
[136]
Jack Sampson, Ruben Gonzalez, Jean-Francois Collard, Norman P. Jouppi, Mike Schlansker, and Brad Calder. 2006. Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, Washington, DC, USA, 235--246.
[137]
Joshua San Miguel and N. Enright Jerger. 2014. Load value approximation: Approaching the ideal memory access latency. In Workshop on Approximate Computing Across the System Stack.
[138]
J. Sartori and R. Kumar. 2010. Low-overhead, high-speed multi-core barrier synchronization. In HiPEAC. 18--34.
[139]
John Sartori and Rakesh Kumar. 2013. Branch and data herding: Reducing control and memory divergence for error-tolerant GPU applications. IEEE Transactions on Multimedia 15, 2, 279--290.
[140]
Vijay Sathish, Michael J. Schulte, and Nam Sung Kim. 2012. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 325--334.
[141]
Yiannakis Sazeides and James E. Smith. 1997. The predictability of data values. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 248--258.
[142]
Steven L. Scott. 1996. Synchronization and communication in the T3E multiprocessor. SIGOPS Operating Systems Review 30, 5, 26--36.
[143]
Marko Scrbak, Mahzabeen Islam, Krishna M. Kavi, Mike Ignatowski, and Nuwan Jayasena. 2015. Processing-in-memory: Exploring the design space. In Architecture of Computing Systems (ARCS’15). Springer, 43--54.
[144]
André Seznec. 2011. A new case for the TAGE branch predictor. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 117--127.
[145]
Ali Shafiee, Meysam Taassori, Rajeev Balasubramonian, and A. K. Davis. 2014. MemZip: Exploring unconventional benefits from memory compression. In IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE, 638--649.
[146]
Li Shang, Li-Shiuan Peh, and Niraj K. Jha. 2003. Dynamic voltage scaling with links for power optimization of interconnection networks. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA’03). IEEE, 91--102.
[147]
Shisheng Shang and Kai Hwang. 1995. Distributed hardwired barrier synchronization for scalable multiprocessor clusters. IEEE Transactions on Parallel Distributed Systems 6, 6, 591--605.
[148]
Majid Shoushtari, Abbas BanaiyanMofrad, and Nikil Dutt. 2015. Exploiting partially-forgetful memories for approximate computing. IEEE Embedded Systems Letters 7, 1, 19--22.
[149]
Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing performance vs. accuracy trade-offs with loop perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. ACM, 124--134.
[150]
María Soler and José Flich. 2013. Power saving by NoC traffic compression. In European Conference on Parallel Processing. Springer, 465--476.
[151]
Renée St. Amant, Amir Yazdanbakhsh, Jongse Park, Bradley Thwaites, Hadi Esmaeilzadeh, Arjang Hassibi, Luis Ceze, and Doug Burger. 2014. General-purpose code acceleration with limited-precision analog computation. ACM SIGARCH Computer Architecture News 42, 3, 505--516.
[152]
J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. 2002. Improving value communication for thread-level speculation. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 65--75.
[153]
Ayswarya Sundaram, Ameen Aakel, Derek Lockhart, Darshan Thaker, and Diana Franklin. 2008. Efficient fault tolerance in multi-media applications through selective instruction replication. In Proceedings of the 2008 Workshop on Radiation Effects and Fault Tolerance in Nanometer Technologies. ACM, 339--346.
[154]
Mark Sutherland, Joshua San Miguel, and Natalie Enright Jerger. 2015. Texture cache approximation on GPUs. In Workshop on Approximate Computing Across the Stack.
[155]
M. B. Taylor. 2012. Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse. In Design Automation Conference.
[156]
Bradley Thwaites, Gennady Pekhimenko, Hadi Esmaeilzadeh, Amir Yazdanbakhsh, Onur Mutlu, Jongse Park, Girish Mururu, and Todd Mowry. 2014. Rollback-free value prediction with approximate loads. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, 493--494.
[157]
Ye Tian, Qian Zhang, Ting Wang, Feng Yuan, and Qiang Xu. 2015. Approxma: Approximate memory access for dynamic precision scaling. In Proceedings of the 25th Edition on Great Lakes Symposium on VLSI. ACM, 337--342.
[158]
Vassilis Vassiliadis, Konstantinos Parasyris, Charalambos Chalios, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2015. A programming model and runtime system for significance-aware energy-efficient computing. In ACM SIGPLAN Notices, Vol. 50. ACM, 275--276.
[159]
Swagath Venkataramani, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2015. Approximate computing and the quest for computing efficiency. In Proceedings of the 52nd Annual Design Automation Conference. ACM, 120.
[160]
Swagath Venkataramani, Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Quality programmable vector processors for approximate computing. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 1--12.
[161]
Swagath Venkataramani, Ashish Ranjan, Kaushik Roy, and Anand Raghunathan. 2014. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proceedings of the 2014 International Symposium on Low Power Electronics and Design. ACM, 27--32.
[162]
Swagath Venkataramani, Kaushik Roy, and Anand Raghunathan. 2013. Substitute-and-simplify: A unified design paradigm for approximate and quality configurable circuits. In Proceedings of the Conference on Design, Automation and Test in Europe. EDA Consortium, 1367--1372.
[163]
Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, and Onur Mutlu. 2015. A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 41--53.
[164]
Oreste Villa, Gianluca Palermo, and Cristina Silvano. 2008. Efficiency and scalability of barrier synchronization on NoC based many-core architectures. In CASES’08: Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, New York, NY, USA, 81--90.
[165]
Gregory K. Wallace. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1, xviii--xxxiv.
[166]
Kai Wang and Manoj Franklin. 1997. Highly accurate data value prediction using hybrid predictors. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society, 281--290.
[167]
Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In Conference Record of the 37th Asilomar Conference on Signals, Systems and Computers, Vol. 2. IEEE, 1398--1402.
[168]
Terry A. Welch. 1984. A technique for high-performance data compression. Computer 6, 17, 8--19.
[169]
Benjamin Welton, Dries Kimpe, Jason Cope, Christina M. Patrick, Kamil Iskra, and Robert Ross. 2011. Improving I/O forwarding throughput with data compression. In IEEE International Conference on Cluster Computing (CLUSTER’11). IEEE, 438--445.
[170]
Yair Wiseman, Karsten Schwan, and Patrick Widener. 2005. Efficient end to end data exchange using configurable compression. ACM SIGOPS Operating Systems Review 39, 3, 4--23.
[171]
Qiang Xu, Todd Mytkowicz, and Nam Sung Kim. 2016. Approximate computing: A survey. IEEE Design 8 Test 33, 1, 8--22.
[172]
Xin Xu and H. Howie Huang. 2015. Exploring data-level error tolerance in high-performance solid-state drives. IEEE Transactions on Reliability 64, 1, 15--30.
[173]
Amir Yazdanbakhsh, Gennady Pekhimenko, Bradley Thwaites, Hadi Esmaeilzadeh, Onur Mutlu, and Todd C. Mowry. 2016. RFVP: Rollback-free value prediction with safe-to-approximate loads. ACM Transactions on Architecture and Code Optimization 12, 4, 62.
[174]
Rong Ye, Ting Wang, Feng Yuan, Rakesh Kumar, and Qiang Xu. 2013. On reconfiguration-oriented approximate adder design and its application. In Proceedings of the International Conference on Computer-Aided Design. IEEE Press, 48--54.
[175]
Yavuz Yetim, Sharad Malik, and Margaret Martonosi. 2015. CommGuard: Mitigating communication errors in error-prone parallel execution. In ACM SIGPLAN Notices, Vol. 50. ACM, 311--323.
[176]
Yavuz Yetim, Margaret Martonosi, and Sharad Malik. 2013. Extracting useful computation from error-prone processors for streaming applications. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’13). IEEE, 202--207.
[177]
Felix Zahn, Steffen Lammel, and Holger Fröning. 2017. Early experiences with saving energy in direct interconnection networks. In IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB’17). IEEE, 33--40.
[178]
Felix Zahn, Pedro Yebenes, Steffen Lammel, Pedro J. Garcia, and Holger Fröning. 2016. Analyzing the energy (dis-)proportionality of scalable interconnection networks. In 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB’16). IEEE, 25--32.
[179]
Hang Zhang, Mateja Putic, and John Lach. 2014. Low power gpgpu computation with imprecise hardware. In Proceedings of the 51st Annual Design Automation Conference. ACM, 1--6.
[180]
Qian Zhang, Ting Wang, Ye Tian, Feng Yuan, and Qiang Xu. 2015. ApproxANN: An approximate computing framework for artificial neural network. In Proceedings of the 2015 Design, Automation 8 Test in Europe Conference 8 Exhibition. EDA Consortium, 701--706.
[181]
Huiyang Zhou and Thomas M. Conte. 2005. Enhancing memory-level parallelism via recovery-free value prediction. IEEE Transactions on Computers 54, 7, 897--912.
[182]
Qiuling Zhu, Bilal Akin, H. Ekin Sumbul, Fazle Sadi, James C. Hoe, Larry Pileggi, and Franz Franchetti. 2013. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In IEEE International 3D Systems Integration Conference (3DIC’13). IEEE, 1--7.
[183]
Weirong Zhu, Vugranam C. Sreedhar, Ziang Hu, and Guang R. Gao. 2007. Synchronization state buffer: Supporting efficient fine-grain synchronization on many-core architectures. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, NY, USA, 35--45.

Cited By

View all
  • (2024)Mapping Model and Heuristics for Accelerating Deep Neural Networks and for Energy-Efficient Networks-on-ChipSoutheastCon 202410.1109/SoutheastCon52093.2024.10500232(119-126)Online publication date: 15-Mar-2024
  • (2024)Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future DirectionsIEEE Access10.1109/ACCESS.2024.346737512(146022-146088)Online publication date: 2024
  • (2024)Adaptive approximate computing in edge AI and IoT applications: A reviewJournal of Systems Architecture10.1016/j.sysarc.2024.103114150(103114)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 51, Issue 1
January 2019
743 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3177787
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2018
Accepted: 01 September 2017
Revised: 01 July 2017
Received: 01 July 2016
Published in CSUR Volume 51, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Approximate communication
  2. approximate computing
  3. communication reduction
  4. scalability

Qualifiers

  • Survey
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)94
  • Downloads (Last 6 weeks)8
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Mapping Model and Heuristics for Accelerating Deep Neural Networks and for Energy-Efficient Networks-on-ChipSoutheastCon 202410.1109/SoutheastCon52093.2024.10500232(119-126)Online publication date: 15-Mar-2024
  • (2024)Approximate Computing: Concepts, Architectures, Challenges, Applications, and Future DirectionsIEEE Access10.1109/ACCESS.2024.346737512(146022-146088)Online publication date: 2024
  • (2024)Adaptive approximate computing in edge AI and IoT applications: A reviewJournal of Systems Architecture10.1016/j.sysarc.2024.103114150(103114)Online publication date: May-2024
  • (2024)Pipe-AGCM: A Fine-Grain Pipelining Scheme for Optimizing the Parallel Atmospheric General Circulation ModelEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_20(283-297)Online publication date: 26-Aug-2024
  • (2023)A Compression Router for Low-Latency Network-on-ChipIEICE Transactions on Information and Systems10.1587/transinf.2022EDP7080E106.D:2(170-180)Online publication date: 1-Feb-2023
  • (2023) XploreNAS: Explore Adversarially Robust and Hardware-efficient Neural Architectures for Non-ideal XbarsACM Transactions on Embedded Computing Systems10.1145/359304522:4(1-17)Online publication date: 24-Jul-2023
  • (2023)Multi-view Graph Representation Learning Beyond HomophilyACM Transactions on Knowledge Discovery from Data10.1145/359285817:8(1-21)Online publication date: 28-Jun-2023
  • (2023)Modeling Regime Shifts in Multiple Time SeriesACM Transactions on Knowledge Discovery from Data10.1145/359285717:8(1-31)Online publication date: 28-Jun-2023
  • (2023)Delay and Price Differentiation in Cloud Computing: A Service Model, Supporting Architectures, and PerformanceACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/35928528:3(1-40)Online publication date: 24-Jun-2023
  • (2023)Resilience-by-design in Adaptive Multi-agent Traffic Control SystemsACM Transactions on Privacy and Security10.1145/359279926:3(1-27)Online publication date: 26-Jun-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media