research-article

RipTide: A Programmable, Energy-Minimal Dataflow Compiler and Architecture

Authors:

Graham Gobieski,

Souradip Ghosh,

Nathan Beckmann,

Brandon LuciaAuthors Info & Claims

MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 546 - 564

https://doi.org/10.1109/MICRO56248.2022.00046

Published: 18 December 2023 Publication History

Abstract

Emerging sensing applications create an unprecedented need for energy efficiency in programmable processors. To achieve useful multi-year deployments on a small battery or energy harvester, these applications must avoid off-device communication and instead process most data locally. Recent work has proven coarse-grained reconfigurable arrays (CGRAs) as a promising architecture for this domain. Unfortunately, nearly all prior CGRAs support only computations with simple control flow and no memory aliasing (e.g., affine inner loops), causing an Amdahl efficiency bottleneck as non-trivial fractions of programs must run on an inefficient von Neumann core.

RipTide is a co-designed compiler and CGRA architecture that achieves both high programmability and extreme energy efficiency, eliminating this bottleneck. RipTide provides a rich set of control-flow operators that support arbitrary control flow and memory access on the CGRA fabric. RipTide implements these primitives without tagged tokens to save energy; this requires careful ordering analysis in the compiler to guarantee correctness. RipTide further saves energy and area by offloading most control operations into its programmable on-chip network, where they can re-use existing network switches. RipTide's compiler is implemented in LLVM, and its hardware is synthesized in Intel 22FFL. RipTide compiles applications written in C while saving 25% energy v. the state-of-the-art energy-minimal CGRA and 6.6× energy v. a von Neumann core.

References

[1]

"Stm32l152re." [Online]. Available: https://www.st.com/en/microcontrollers-microprocessors/stm32l152re.html

[2]

A. Aho, R. Sethi, and J. Ullman, Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, 1986.

Digital Library

[3]

A. V. Aho, M. R. Garey, and J. D. Ullman, "The transitive reduction of a directed graph," SIAM Journal on Computing, vol. 1, no. 2, pp. 131--137, 1972. [Online].

Digital Library

[4]

O. Bachmann, P. S. Wang, and E. V. Zima, "Chains of recurrences---a method to expedite the evaluation of closed-form functions," in Proceedings of the International Symposium on Symbolic and Algebraic Computation, ser. ISSAC '94. New York, NY, USA: Association for Computing Machinery, 1994, p. 242--249. [Online].

Digital Library

[5]

M. Balasubramanian, S. Dave, A. Shrivastava, and R. Jeyapaul, "Laser: A hardware/software approach to accelerate complicated loops on cgras," in 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2018, pp. 1069--1074.

[6]

M. Balasubramanian and A. Shrivastava, "Pathseeker: a fast mapping algorithm for cgras," in 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2022, pp. 268--273.

[7]

T. K. Bandara, D. Wijerathne, T. Mitra, and L.-S. Peh, "Revamp: A systematic framework for heterogeneous cgra realization," in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS 2022. New York, NY, USA: Association for Computing Machinery, 2022, p. 918--932. [Online].

Digital Library

[8]

A. Biere, "Yet another local search solver and Lingeling and friends entering the SAT Competition 2014," in Proc. of SAT Competition 2014 - Solver and Benchmark Descriptions, ser. Department of Computer Science Series of Publications B, A. Balint, A. Belov, M. Heule, and M. Järvisalo, Eds., vol. B-2014-2. University of Helsinki, 2014, pp. 39--40.

[9]

A. Biere, K. Fazekas, M. Fleury, and M. Heisinger, "CaDiCaL, Kissat, Paracooba, Plingeling and Treengeling entering the SAT Competition 2020," in Proc. of SAT Competition 2020 - Solver and Benchmark Descriptions, ser. Department of Computer Science Report Series B, T. Balyo, N. Froleyks, M. Heule, M. Iser, M. Järvisalo, and M. Suda, Eds., vol. B-2020-1. University of Helsinki, 2020, pp. 51--53.

[10]

M. Budiu, P. Artigas, and S. Goldstein, "Dataflow: A complement to superscalar," in IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005., 2005, pp. 177--186.

[11]

M. Budiu, P. V. Artigas, and S. C. Goldstein, "Dataflow: A complement to superscalar," in IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. IEEE, 2005, pp. 177--186.

[12]

D.-K. Chen and P.-C. Yew, "Redundant synchronization elimination for doacross loops," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 5, pp. 459--470, 1999.

Digital Library

[13]

S. A. Chin and J. H. Anderson, "An architecture-agnostic integer linear programming approach to cgra mapping," in Proceedings of the 55th Annual Design Automation Conference, 2018, pp. 1--6.

[14]

S. A. Chin, N. Sakamoto, A. Rui, J. Zhao, J. H. Kim, Y. Hara-Azumi, and J. Anderson, "Cgra-me: A unified framework for cgra modelling and exploration," in 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP). IEEE, 2017, pp. 184--189.

[15]

J. Cong, H. Huang, C. Ma, B. Xiao, and P. Zhou, "A fully pipelined and dynamically composable architecture of cgra," in 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, 2014, pp. 9--16.

[16]

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph," ACM Trans. Program. Lang. Syst., vol. 13, no. 4, p. 451--490, oct 1991. [Online].

Digital Library

[17]

V. Dadu, S. Liu, and T. Nowatzki, PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. IEEE Press, 2021, p. 595--608. [Online].

Digital Library

[18]

V. Dadu and T. Nowatzki, TaskStream: Accelerating Task-Parallel Workloads by Recovering Program Structure. New York, NY, USA: Association for Computing Machinery, 2022, p. 1--13. [Online].

Digital Library

[19]

V. Dadu, J. Weng, S. Liu, and T. Nowatzki, "Towards general purpose acceleration by exploiting common data-dependence forms," in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 924--939.

[20]

W. J. Dally, J. Balfour, D. Black-Shaffer, J. Chen, R. C. Harting, V. Parikh, J. Park, and D. Sheffield, "Efficient embedded computing," Computer, vol. 41, no. 7, 2008.

[21]

S. Das, D. Rossi, K. J. Martin, P. Coussy, and L. Benini, "A 142mops/mw integrated programmable array accelerator for smart visual processing," in ISCAS, 2017.

[22]

S. Dave, M. Balasubramanian, and A. Shrivastava, "Ureca: Unified register file for cgras," in 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2018, pp. 1081--1086.

[23]

B. Denby and B. Lucia, "Orbital edge computing: Nanosatellite constellations as a new class of computer system," in ASPLOS 25, 2020.

[24]

J. B. Dennis and D. P. Misunas, "A preliminary architecture for a basic data-flow processor," in ACM SIGARCH Computer Architecture News, vol. 3, no. 4, 1975.

Digital Library

[25]

S. Diamond and S. Boyd, "CVXPY: A Python-embedded modeling language for convex optimization," Journal of Machine Learning Research, vol. 17, no. 83, pp. 1--5, 2016.

Digital Library

[26]

M. Duric, O. Palomar, A. Smith, O. Unsal, A. Cristal, M. Valero, and D. Burger, "Evx: Vector execution on low power edge cores," in DATE, 2014.

[27]

G. Gobieski, A. O. Atli, K. Mai, B. Lucia, and N. Beckmann, "Snafu: an ultra-low-power, energy-minimal cgra-generation framework and architecture," in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021, pp. 1027--1040.

[28]

G. Gobieski, N. Beckmann, and B. Lucia, "Intermittent deep neural network inference," in SysML, 2018.

[29]

G. Gobieski, B. Lucia, and N. Beckmann, "Intelligence beyond the edge: Inference on intermittent embedded systems," in ASPLOS, 2019.

Digital Library

[30]

G. Gobieski, A. Nagi, N. Serafin, M. M. Isgenc, N. Beckmann, and B. Lucia, "Manic: A vector-dataflow architecture for ultra-low-power embedded systems," in MICRO, 2019.

Digital Library

[31]

S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. R. Taylor, "Piperench: A reconfigurable architecture and compiler," Computer, vol. 33, no. 4, 2000.

[32]

V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim, "Dyser: Unifying functionality and parallelism specialization for energy-efficient computing," IEEE Micro, vol. 32, no. 5, 2012.

[33]

S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, 2011, pp. 12--23.

[34]

U. Gupta, Y. G. Kim, S. Lee, J. Tse, H.-H. S. Lee, G.-Y. Wei, D. Brooks, and C.-J. Wu, "Chasing carbon: The elusive environmental footprint of computing," IEEE Micro, 2022.

[35]

Gurobi Optimization, LLC, "Gurobi Optimizer Reference Manual," 2022. [Online]. Available: https://www.gurobi.com

[36]

M. Hamzeh, A. Shrivastava, and S. Vrudhula, "Epimap: Using epimor-phism to map applications on cgras," in Proceedings of the 49th Annual Design Automation Conference, 2012, pp. 1284--1291.

[37]

M. Hamzeh, A. Shrivastava, and S. Vrudhula, "Branch-aware loop mapping on cgras," in Proceedings of the 51st Annual Design Automation Conference, 2014, pp. 1--6.

[38]

M. Hind, M. Burke, P. Carini, and J.-D. Choi, "Interprocedural pointer alias analysis," ACM Trans. Program. Lang. Syst., vol. 21, no. 4, p. 848--894, jul 1999. [Online].

Digital Library

[39]

M. Horowitz, "Computing's energy problem (and what we can do about it)," in ISSCC, 2014.

[40]

T. Instruments, "Msp430fr5994 sla," 2017. [Online]. Available: http://www.ti.com/lit/ds/symlink/msp430fr5994.pdf

[41]

N. P. Jouppi, D. H. Yoon, M. Ashcraft, M. Gottscho, T. B. Jablin, G. Kurian, J. Laudon, S. Li, P. Ma, X. Ma et al., "Ten lessons from three generations shaped google's tpuv4i: Industrial product," in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021, pp. 1--14.

[42]

M. Karunaratne, A. K. Mohite, T. Mitra, and L.-S. Peh, "Hycube: A cgra with reconfigurable single-cycle multi-hop interconnect," in DAC, 2017.

[43]

M. Karunaratne, C. Tan, A. Kulkarni, T. Mitra, and L.-S. Peh, "Dnestmap: mapping deeply-nested loops on ultra-low power cgras," in Proceedings of the 55th Annual Design Automation Conference, 2018, pp. 1--6.

[44]

M. Karunaratne, D. Wijerathne, T. Mitra, and L.-S. Peh, "4d-cgra: Introducing branch dimension to spatio-temporal application mapping on cgras," in 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2019, pp. 1--8.

[45]

M. Khazraee, L. Zhang, L. Vega, and M. B. Taylor, "Moonwalk: Nre optimization in asic clouds," in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '17. New York, NY, USA: Association for Computing Machinery, 2017, p. 511--526. [Online].

Digital Library

[46]

C. Kim, M. Chung, Y. Cho, M. Konijnenburg, S. Ryu, and J. Kim, "Ulp-srp: Ultra low power samsung reconfigurable processor for biomedical applications," in ICFPT, 2012.

[47]

Y. Kim and R. N. Mahapatra, "Hierarchical reconfigurable computing arrays for efficient cgra-based embedded systems," in Proceedings of the 46th Annual Design Automation Conference, 2009, pp. 826--831.

[48]

C. Lattner and V. Adve, "LLVM: A compilation framework for lifelong program analysis & transformation," in CGO, Mar. 2004.

[49]

Y. Le Cun, L. Jackel, B. Boser, J. Denker, H. Graf, I. Guyon, D. Henderson, R. Howard, and W. Hubbard, "Handwritten digit recognition: Applications of neural network chips and automatic learning," IEEE Communications Magazine, vol. 27, no. 11, 1989.

Digital Library

[50]

J. Lee and T. E. Carlson, "Ultra-fast cgra scheduling to enable run time, programmable cgras," in 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 2021, pp. 1207--1212.

[51]

Z. Li, D. Wijerathne, X. Chen, A. Pathania, and T. Mitra, "Chordmap: Automated mapping of streaming applications onto cgra," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 2, pp. 306--319, 2021.

[52]

B. Lucia, V. Balaji, A. Colin, K. Maeng, and E. Ruppel, "Intermittent Computing: Challenges and Opportunities," Dagstuhl, Germany, 2017. [Online]. Available: http://drops.dagstuhl.de/opus/volltexte/2017/7131

[53]

B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins, "Adres: An architecture with tightly coupled vliw processor and coarsegrained reconfigurable matrix," in International Conference on Field Programmable Logic and Applications. Springer, 2003, pp. 61--70.

[54]

S. Midkiff and D. Padua, "A comparison of four synchronization optimization techniques," in Intl. Conf. on Parallel Processing, vol. 2, 1991, pp. 9--16.

[55]

S. P. Midkiff and D. A. Padua, "Compiler algorithms for synchronization," IEEE Transactions on Computers, vol. C-36, no. 12, pp. 1485--1495, 1987.

[56]

E. Mirsky, A. DeHon et al., "Matrix: a reconfigurable computing architecture with configurable instruction distribution and deployable resources." in FCCM, vol. 96, 1996, pp. 17--19.

[57]

M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu, "Tartan: evaluating spatial computation for whole program execution," ACM SIGARCH Computer Architecture News, vol. 34, no. 5, 2006.

Digital Library

[58]

T. Miyamori and K. Olukotun, "Remarc: Reconfigurable multimedia array coprocessor," IEICE Transactions on information and systems, vol. 82, no. 2, pp. 389--397, 1999.

[59]

Q. M. Nguyen and D. Sanchez, "Fifer: Practical acceleration of irregular applications on reconfigurable architectures," in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021, pp. 1064--1077.

[60]

C. Nicol, "A coarse grain reconfigurable array (CGRA) for statically scheduled data flow computing," WaveComputing WhitePaper, 2017.

[61]

R. S. Nikhil et al., "Executing a program on the mit tagged-token dataflow architecture," IEEE Transactions on computers, vol. 39, no. 3, 1990.

[62]

T. Nowatzki, N. Ardalani, K. Sankaralingam, and J. Weng, "Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign," in PACT 27, 2018.

[63]

T. Nowatzki, N. Ardalani, K. Sankaralingam, and J. Weng, "Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign," in Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '18. New York, NY, USA: ACM, 2018, pp. 36:1--36:15. [Online].

Digital Library

[64]

T. Nowatzki, V. Gangadhar, N. Ardalani, and K. Sankaralingam, "Stream-dataflow acceleration," in ISCA 44, 2017.

[65]

T. Nowatzki, V. Gangadhar, and K. Sankaralingam, "Exploring the potential of heterogeneous von neumann/dataflow execution models," in Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015, pp. 298--310.

[66]

T. Nowatzki, V. Gangadhar, K. Sankaralingam, and G. Wright, "Domain specialization is generally unnecessary for accelerators," IEEE Micro, vol. 37, no. 3, 2017.

[67]

T. Nowatzki, M. Sartin-Tarm, L. De Carli, K. Sankaralingam, C. Estan, and B. Robatmili, "A general constraint-centric scheduling framework for spatial architectures," ACM SIGPLAN Notices, vol. 48, no. 6, 2013.

[68]

N. Ozaki, Y. Yasuda, M. Izawa, Y. Saito, D. Ikebuchi, H. Amano, H. Nakamura, K. Usami, M. Namiki, and M. Kondo, "Cool mega-arrays: Ultralow-power reconfigurable accelerator chips," IEEE Micro, vol. 31, no. 6, 2011.

[69]

J. Pager, R. Jeyapaul, and A. Shrivastava, "A software scheme for multithreading on cgras," ACM Transactions on Embedded Computing Systems (TECS), vol. 14, no. 1, pp. 1--26, 2015.

Digital Library

[70]

G. M. Papadopoulos and D. E. Culler, "Monsoon: An explicit token-store architecture," SIGARCH Comput. Archit. News, vol. 18, no. 2SI, p. 82--91, may 1990. [Online].

Digital Library

[71]

A. Parashar, M. Pellauer, M. Adler, B. Ahsan, N. Crago, D. Lustig, V. Pavlov, A. Zhai, M. Gambhir, A. Jaleel et al., "Triggered instructions: a control paradigm for spatially-programmed architectures," ACM SIGARCH Computer Architecture News, vol. 41, no. 3, 2013.

Digital Library

[72]

H. Park, Y. Park, and S. Mahlke, "Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 42. New York, NY, USA: Association for Computing Machinery, 2009, p. 370--380. [Online].

Digital Library

[73]

P. M. Phothilimthana, T. Jelvis, R. Shah, N. Totla, S. Chasins, and R. Bodik, "Chlorophyll: Synthesis-aided compiler for low-power spatial architectures," SIGPLAN Not., vol. 49, no. 6, p. 396--407, jun 2014. [Online].

Digital Library

[74]

R. Prabhakar, Y. Zhang, D. Koeplinger, M. Feldman, T. Zhao, S. Hadjis, A. Pedram, C. Kozyrakis, and K. Olukotun, "Plasticine: A reconfigurable architecture for parallel patterns," in ISCA 44, 2017.

[75]

A. Rucker, M. Vilim, T. Zhao, Y. Zhang, R. Prabhakar, and K. Olukotun, "Capstan: A vector rda for sparsity," 2021.

[76]

K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore, "Exploiting ilp, tlp, and dlp with the polymorphous trips architecture," in ISCA 30, 2003.

[77]

K. Sankaralingam, T. Nowatzki, G. Wright, P. Palamuttam, J. Khare, V. Gangadhar, and P. Shah, "Mozart: Designing for software maturity and the next paradigm for chip architectures," in IEEE Hot Chips 33 Symposium, HCS 2021, Palo Alto, CA, USA, August 22--24, 2021. IEEE, 2021, pp. 1--20. [Online].

[78]

M. Satyanarayanan, N. Beckmann, G. A. Lewis, and B. Lucia, "The role of edge offload for hardware-accelerated mobile devices," in Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications, ser. HotMobile '21. New York, NY, USA: Association for Computing Machinery, 2021, p. 22--29. [Online].

Digital Library

[79]

M. Satyanarayanan, N. Beckmann, G. A. Lewis, and B. Lucia, "The role of edge offload for hardware-accelerated mobile devices," in HotMobile, 2021.

[80]

H. Singh, M.-H. Lee, G. Lu, F. Kurdahi, N. Bagherzadeh, and E. Chaves Filho, "Morphosys: an integrated reconfigurable system for data-parallel and computation-intensive applications," IEEE Transactions on Computers, vol. 49, no. 5, pp. 465--481, 2000.

Digital Library

[81]

P. Sparks, "A route to a trillion devices," Arm WhitePaper, 2017.

[82]

A. K. Sujeeth, K. J. Brown, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun, "Delite: A compiler architecture for performance-oriented embedded domain-specific languages," ACM Transactions on Embedded Computing Systems (TECS), vol. 13, no. 4s, pp. 1--25, 2014.

Digital Library

[83]

S. Swanson, K. Michelson, A. Schwerin, and M. Oskin, "Wavescalar," in MICRO 36, 2003.

[84]

C. Tan, M. Karunaratne, T. Mitra, and L.-S. Peh, "Stitch: Fusible heterogeneous accelerators enmeshed with many-core architecture for wearables," in ISCA 45, 2018.

[85]

C. Tan, C. Xie, A. Li, K. J. Barker, and A. Tumeo, "Opencgra: An open-source unified framework for modeling, testing, and evaluating cgras," in 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE, 2020, pp. 381--388.

[86]

F. Tavares, "Kicksat 2," May 2019. [Online]. Available: https://www.nasa.gov/ames/kicksat

[87]

M. B. Taylor, "Is dark silicon useful? harnessing the four horsemen of the coming dark silicon apocalypse," in DAC, 2012.

[88]

C. Torng and P. Pan, "Ue-cgra hpca 2021 artifact," Mar 2021. [Online]. Available: https://github.com/cornell-brg/torng-uecgra-scripts-hpca2021

[89]

C. Torng, P. Pan, Y. Ou, C. Tan, and C. Batten, "Ultra-elastic cgras for irregular loop specialization," in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2021, pp. 412--425.

[90]

N. Vedula, A. Shriraman, S. Kumar, and W. N. Sumner, "Nachos: Software-driven hardware-assisted memory disambiguation for accelerators," in 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018, pp. 710--723.

[91]

G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation cores: reducing the energy of mature computations," in ACM SIGARCH Computer Architecture News, vol. 38, no. 1, 2010.

Digital Library

[92]

M. Vilim, A. Rucker, Y. Zhang, S. Liu, and K. Olukotun, "Gorgon: Accelerating machine learning from relational data," in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 2020, pp. 309--321.

[93]

D. Voitsechov and Y. Etsion, "Single-graph multiple flows: Energy efficient design alternative for gpgpus," ACM SIGARCH computer architecture news, vol. 42, no. 3, 2014.

[94]

D. Voitsechov, O. Port, and Y. Etsion, "Inter-thread communication in multithreaded, reconfigurable coarse-grain arrays," in MICRO 51, 2018.

[95]

E. Waingold et al., "Baring It All to Software: Raw Machines," in IEEE Computer, September 1997.

[96]

B. Wang, M. Karunarathne, A. Kulkarni, T. Mitra, and L.-S. Peh, "Hycube: A 0.9 v 26.4 mops/mw, 290 pj/op, power efficient accelerator for iot applications," in 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC). IEEE, 2019, pp. 133--136.

[97]

J. Weng, S. Liu, V. Dadu, Z. Wang, P. Shah, and T. Nowatzki, "Dsagen: synthesizing programmable spatial accelerators," in ISCA 47, 2020.

[98]

J. Weng, S. Liu, Z. Wang, V. Dadu, and T. Nowatzki, "A hybrid systolic-dataflow architecture for inductive matrix algorithms," in HPCA, 2020.

[99]

D. Wijerathne, Z. Li, A. Pathania, T. Mitra, and L. Thiele, "Himap: Fast and scalable high-quality mapping on cgra via hierarchical abstraction," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021.

[100]

L. Wu, A. Lottarini, T. K. Paine, M. A. Kim, and K. A. Ross, "Q100: The architecture and design of a database processing unit," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '14. New York, NY, USA: ACM, 2014, pp. 255--268. [Online].

Digital Library

[101]

Y. Yang, J. S. Emer, and D. Sanchez, "Spzip: architectural support for effective data compression in irregular applications," in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021, pp. 1069--1082.

[102]

Z. Zhao, W. Sheng, Q. Wang, W. Yin, P. Ye, J. Li, and Z. Mao, "Towards higher performance and robust compilation for cgra modulo scheduling," IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 9, pp. 2201--2219, 2020.

Cited By

Lin ZGancher JParno B(2024)FlowCert: Translation Validation for Asynchronous Dataflow via Dynamic Fractional PermissionsProceedings of the ACM on Programming Languages10.1145/36897298:OOPSLA2(499-526)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689729
de Bruin BVadivel KWijtvliet MJääskeläinen PCorporaal H(2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656642
Ahmed SIslam BYildirim KZimmerling MPawełczak PAlizai MLucia BMottola LSorber JHester J(2024)The Internet of Batteryless ThingsCommunications of the ACM10.1145/362471867:3(64-73)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1145/3624718
Show More Cited By

Index Terms

RipTide: A Programmable, Energy-Minimal Dataflow Compiler and Architecture
1. Computer systems organization
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Index terms have been assigned to the content through auto-classification.

Recommendations

A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications

As general-purpose processors have hit the power wall and chip fabrication cost escalates alarmingly, coarse-grained reconfigurable architectures (CGRAs) are attracting increasing interest from both academia and industry, because they offer the ...
Snafu: an ultra-low-power, energy-minimal CGRA-generation framework and architecture
ISCA '21: Proceedings of the 48th Annual International Symposium on Computer Architecture

Ultra-low-power (ULP) devices are becoming pervasive, enabling many emerging sensing applications. Energy-efficiency is paramount in these applications, as efficiency determines device lifetime in battery-powered deployments and performance in energy-...
Stream-Dataflow Acceleration
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Demand for low-power data processing hardware continues to rise inexorably. Existing programmable and "general purpose" solutions (eg. SIMD, GPGPUs) are insufficient, as evidenced by the order-of-magnitude improvements and industry adoption of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture

October 2022

1498 pages

ISBN:9781665462723

General Chairs:
Nikos Hardavellas
Northwestern University
,
Simone Campanoni
Northwestern University
,
Program Chairs:
Boris Grot
University of Edinburgh
,
Ulya Karpuzcu
University of Minnesota, Twin Cities

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Press

Publication History

Published: 18 December 2023

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MICRO '22

Sponsor:

SIGMICRO

MICRO '22: 55th Annual IEEE/ACM International Symposium on Microarchitecture

October 1 - 5, 2022

Illinois, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
98
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)35

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin ZGancher JParno B(2024)FlowCert: Translation Validation for Asynchronous Dataflow via Dynamic Fractional PermissionsProceedings of the ACM on Programming Languages10.1145/36897298:OOPSLA2(499-526)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689729
de Bruin BVadivel KWijtvliet MJääskeläinen PCorporaal H(2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3656642
Ahmed SIslam BYildirim KZimmerling MPawełczak PAlizai MLucia BMottola LSorber JHester J(2024)The Internet of Batteryless ThingsCommunications of the ACM10.1145/362471867:3(64-73)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1145/3624718
Morais LÁlvarez CJiménez-González Dde Haro JAraujo GFrank MGoldman AMartorell X(2024)Enabling HW-Based Task Scheduling in Large Multicore ArchitecturesIEEE Transactions on Computers10.1109/TC.2023.332378173:1(138-151)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TC.2023.3323781
Serafin NGhosh SDesai HBeckmann NLucia B(2023)Pipestitch: An energy-minimal dataflow architecture with lightweight threadsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614283(1409-1422)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614283
Deng JTang XZhang JLi YZhang LHan BHe HTu FLiu LWei SHu YYin S(2023)Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow PlaneProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614246(1395-1408)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614246
Chen KNelson TKhadem AFayazi MSingapuram SDreslinski RTalati NKim HBlaauw D(undefined)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/3695880
https://dl.acm.org/doi/10.1145/3695880

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents