Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/MICRO56248.2022.00046acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

RipTide: A Programmable, Energy-Minimal Dataflow Compiler and Architecture

Published: 18 December 2023 Publication History

Abstract

Emerging sensing applications create an unprecedented need for energy efficiency in programmable processors. To achieve useful multi-year deployments on a small battery or energy harvester, these applications must avoid off-device communication and instead process most data locally. Recent work has proven coarse-grained reconfigurable arrays (CGRAs) as a promising architecture for this domain. Unfortunately, nearly all prior CGRAs support only computations with simple control flow and no memory aliasing (e.g., affine inner loops), causing an Amdahl efficiency bottleneck as non-trivial fractions of programs must run on an inefficient von Neumann core.
RipTide is a co-designed compiler and CGRA architecture that achieves both high programmability and extreme energy efficiency, eliminating this bottleneck. RipTide provides a rich set of control-flow operators that support arbitrary control flow and memory access on the CGRA fabric. RipTide implements these primitives without tagged tokens to save energy; this requires careful ordering analysis in the compiler to guarantee correctness. RipTide further saves energy and area by offloading most control operations into its programmable on-chip network, where they can re-use existing network switches. RipTide's compiler is implemented in LLVM, and its hardware is synthesized in Intel 22FFL. RipTide compiles applications written in C while saving 25% energy v. the state-of-the-art energy-minimal CGRA and 6.6× energy v. a von Neumann core.

References

[1]
"Stm32l152re." [Online]. Available: https://www.st.com/en/microcontrollers-microprocessors/stm32l152re.html
[2]
A. Aho, R. Sethi, and J. Ullman, Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, 1986.
[3]
A. V. Aho, M. R. Garey, and J. D. Ullman, "The transitive reduction of a directed graph," SIAM Journal on Computing, vol. 1, no. 2, pp. 131--137, 1972. [Online].
[4]
O. Bachmann, P. S. Wang, and E. V. Zima, "Chains of recurrences---a method to expedite the evaluation of closed-form functions," in Proceedings of the International Symposium on Symbolic and Algebraic Computation, ser. ISSAC '94. New York, NY, USA: Association for Computing Machinery, 1994, p. 242--249. [Online].
[5]
M. Balasubramanian, S. Dave, A. Shrivastava, and R. Jeyapaul, "Laser: A hardware/software approach to accelerate complicated loops on cgras," in 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2018, pp. 1069--1074.
[6]
M. Balasubramanian and A. Shrivastava, "Pathseeker: a fast mapping algorithm for cgras," in 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2022, pp. 268--273.
[7]
T. K. Bandara, D. Wijerathne, T. Mitra, and L.-S. Peh, "Revamp: A systematic framework for heterogeneous cgra realization," in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS 2022. New York, NY, USA: Association for Computing Machinery, 2022, p. 918--932. [Online].
[8]
A. Biere, "Yet another local search solver and Lingeling and friends entering the SAT Competition 2014," in Proc. of SAT Competition 2014 - Solver and Benchmark Descriptions, ser. Department of Computer Science Series of Publications B, A. Balint, A. Belov, M. Heule, and M. Järvisalo, Eds., vol. B-2014-2. University of Helsinki, 2014, pp. 39--40.
[9]
A. Biere, K. Fazekas, M. Fleury, and M. Heisinger, "CaDiCaL, Kissat, Paracooba, Plingeling and Treengeling entering the SAT Competition 2020," in Proc. of SAT Competition 2020 - Solver and Benchmark Descriptions, ser. Department of Computer Science Report Series B, T. Balyo, N. Froleyks, M. Heule, M. Iser, M. Järvisalo, and M. Suda, Eds., vol. B-2020-1. University of Helsinki, 2020, pp. 51--53.
[10]
M. Budiu, P. Artigas, and S. Goldstein, "Dataflow: A complement to superscalar," in IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005., 2005, pp. 177--186.
[11]
M. Budiu, P. V. Artigas, and S. C. Goldstein, "Dataflow: A complement to superscalar," in IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. IEEE, 2005, pp. 177--186.
[12]
D.-K. Chen and P.-C. Yew, "Redundant synchronization elimination for doacross loops," IEEE Transactions on Parallel and Distributed Systems, vol. 10, no. 5, pp. 459--470, 1999.
[13]
S. A. Chin and J. H. Anderson, "An architecture-agnostic integer linear programming approach to cgra mapping," in Proceedings of the 55th Annual Design Automation Conference, 2018, pp. 1--6.
[14]
S. A. Chin, N. Sakamoto, A. Rui, J. Zhao, J. H. Kim, Y. Hara-Azumi, and J. Anderson, "Cgra-me: A unified framework for cgra modelling and exploration," in 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP). IEEE, 2017, pp. 184--189.
[15]
J. Cong, H. Huang, C. Ma, B. Xiao, and P. Zhou, "A fully pipelined and dynamically composable architecture of cgra," in 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, 2014, pp. 9--16.
[16]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, "Efficiently computing static single assignment form and the control dependence graph," ACM Trans. Program. Lang. Syst., vol. 13, no. 4, p. 451--490, oct 1991. [Online].
[17]
V. Dadu, S. Liu, and T. Nowatzki, PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. IEEE Press, 2021, p. 595--608. [Online].
[18]
V. Dadu and T. Nowatzki, TaskStream: Accelerating Task-Parallel Workloads by Recovering Program Structure. New York, NY, USA: Association for Computing Machinery, 2022, p. 1--13. [Online].
[19]
V. Dadu, J. Weng, S. Liu, and T. Nowatzki, "Towards general purpose acceleration by exploiting common data-dependence forms," in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 924--939.
[20]
W. J. Dally, J. Balfour, D. Black-Shaffer, J. Chen, R. C. Harting, V. Parikh, J. Park, and D. Sheffield, "Efficient embedded computing," Computer, vol. 41, no. 7, 2008.
[21]
S. Das, D. Rossi, K. J. Martin, P. Coussy, and L. Benini, "A 142mops/mw integrated programmable array accelerator for smart visual processing," in ISCAS, 2017.
[22]
S. Dave, M. Balasubramanian, and A. Shrivastava, "Ureca: Unified register file for cgras," in 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2018, pp. 1081--1086.
[23]
B. Denby and B. Lucia, "Orbital edge computing: Nanosatellite constellations as a new class of computer system," in ASPLOS 25, 2020.
[24]
J. B. Dennis and D. P. Misunas, "A preliminary architecture for a basic data-flow processor," in ACM SIGARCH Computer Architecture News, vol. 3, no. 4, 1975.
[25]
S. Diamond and S. Boyd, "CVXPY: A Python-embedded modeling language for convex optimization," Journal of Machine Learning Research, vol. 17, no. 83, pp. 1--5, 2016.
[26]
M. Duric, O. Palomar, A. Smith, O. Unsal, A. Cristal, M. Valero, and D. Burger, "Evx: Vector execution on low power edge cores," in DATE, 2014.
[27]
G. Gobieski, A. O. Atli, K. Mai, B. Lucia, and N. Beckmann, "Snafu: an ultra-low-power, energy-minimal cgra-generation framework and architecture," in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021, pp. 1027--1040.
[28]
G. Gobieski, N. Beckmann, and B. Lucia, "Intermittent deep neural network inference," in SysML, 2018.
[29]
G. Gobieski, B. Lucia, and N. Beckmann, "Intelligence beyond the edge: Inference on intermittent embedded systems," in ASPLOS, 2019.
[30]
G. Gobieski, A. Nagi, N. Serafin, M. M. Isgenc, N. Beckmann, and B. Lucia, "Manic: A vector-dataflow architecture for ultra-low-power embedded systems," in MICRO, 2019.
[31]
S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. R. Taylor, "Piperench: A reconfigurable architecture and compiler," Computer, vol. 33, no. 4, 2000.
[32]
V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim, "Dyser: Unifying functionality and parallelism specialization for energy-efficient computing," IEEE Micro, vol. 32, no. 5, 2012.
[33]
S. Gupta, S. Feng, A. Ansari, S. Mahlke, and D. August, "Bundled execution of recurring traces for energy-efficient general purpose processing," in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, 2011, pp. 12--23.
[34]
U. Gupta, Y. G. Kim, S. Lee, J. Tse, H.-H. S. Lee, G.-Y. Wei, D. Brooks, and C.-J. Wu, "Chasing carbon: The elusive environmental footprint of computing," IEEE Micro, 2022.
[35]
Gurobi Optimization, LLC, "Gurobi Optimizer Reference Manual," 2022. [Online]. Available: https://www.gurobi.com
[36]
M. Hamzeh, A. Shrivastava, and S. Vrudhula, "Epimap: Using epimor-phism to map applications on cgras," in Proceedings of the 49th Annual Design Automation Conference, 2012, pp. 1284--1291.
[37]
M. Hamzeh, A. Shrivastava, and S. Vrudhula, "Branch-aware loop mapping on cgras," in Proceedings of the 51st Annual Design Automation Conference, 2014, pp. 1--6.
[38]
M. Hind, M. Burke, P. Carini, and J.-D. Choi, "Interprocedural pointer alias analysis," ACM Trans. Program. Lang. Syst., vol. 21, no. 4, p. 848--894, jul 1999. [Online].
[39]
M. Horowitz, "Computing's energy problem (and what we can do about it)," in ISSCC, 2014.
[40]
T. Instruments, "Msp430fr5994 sla," 2017. [Online]. Available: http://www.ti.com/lit/ds/symlink/msp430fr5994.pdf
[41]
N. P. Jouppi, D. H. Yoon, M. Ashcraft, M. Gottscho, T. B. Jablin, G. Kurian, J. Laudon, S. Li, P. Ma, X. Ma et al., "Ten lessons from three generations shaped google's tpuv4i: Industrial product," in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021, pp. 1--14.
[42]
M. Karunaratne, A. K. Mohite, T. Mitra, and L.-S. Peh, "Hycube: A cgra with reconfigurable single-cycle multi-hop interconnect," in DAC, 2017.
[43]
M. Karunaratne, C. Tan, A. Kulkarni, T. Mitra, and L.-S. Peh, "Dnestmap: mapping deeply-nested loops on ultra-low power cgras," in Proceedings of the 55th Annual Design Automation Conference, 2018, pp. 1--6.
[44]
M. Karunaratne, D. Wijerathne, T. Mitra, and L.-S. Peh, "4d-cgra: Introducing branch dimension to spatio-temporal application mapping on cgras," in 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2019, pp. 1--8.
[45]
M. Khazraee, L. Zhang, L. Vega, and M. B. Taylor, "Moonwalk: Nre optimization in asic clouds," in Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '17. New York, NY, USA: Association for Computing Machinery, 2017, p. 511--526. [Online].
[46]
C. Kim, M. Chung, Y. Cho, M. Konijnenburg, S. Ryu, and J. Kim, "Ulp-srp: Ultra low power samsung reconfigurable processor for biomedical applications," in ICFPT, 2012.
[47]
Y. Kim and R. N. Mahapatra, "Hierarchical reconfigurable computing arrays for efficient cgra-based embedded systems," in Proceedings of the 46th Annual Design Automation Conference, 2009, pp. 826--831.
[48]
C. Lattner and V. Adve, "LLVM: A compilation framework for lifelong program analysis & transformation," in CGO, Mar. 2004.
[49]
Y. Le Cun, L. Jackel, B. Boser, J. Denker, H. Graf, I. Guyon, D. Henderson, R. Howard, and W. Hubbard, "Handwritten digit recognition: Applications of neural network chips and automatic learning," IEEE Communications Magazine, vol. 27, no. 11, 1989.
[50]
J. Lee and T. E. Carlson, "Ultra-fast cgra scheduling to enable run time, programmable cgras," in 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 2021, pp. 1207--1212.
[51]
Z. Li, D. Wijerathne, X. Chen, A. Pathania, and T. Mitra, "Chordmap: Automated mapping of streaming applications onto cgra," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 2, pp. 306--319, 2021.
[52]
B. Lucia, V. Balaji, A. Colin, K. Maeng, and E. Ruppel, "Intermittent Computing: Challenges and Opportunities," Dagstuhl, Germany, 2017. [Online]. Available: http://drops.dagstuhl.de/opus/volltexte/2017/7131
[53]
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins, "Adres: An architecture with tightly coupled vliw processor and coarsegrained reconfigurable matrix," in International Conference on Field Programmable Logic and Applications. Springer, 2003, pp. 61--70.
[54]
S. Midkiff and D. Padua, "A comparison of four synchronization optimization techniques," in Intl. Conf. on Parallel Processing, vol. 2, 1991, pp. 9--16.
[55]
S. P. Midkiff and D. A. Padua, "Compiler algorithms for synchronization," IEEE Transactions on Computers, vol. C-36, no. 12, pp. 1485--1495, 1987.
[56]
E. Mirsky, A. DeHon et al., "Matrix: a reconfigurable computing architecture with configurable instruction distribution and deployable resources." in FCCM, vol. 96, 1996, pp. 17--19.
[57]
M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu, "Tartan: evaluating spatial computation for whole program execution," ACM SIGARCH Computer Architecture News, vol. 34, no. 5, 2006.
[58]
T. Miyamori and K. Olukotun, "Remarc: Reconfigurable multimedia array coprocessor," IEICE Transactions on information and systems, vol. 82, no. 2, pp. 389--397, 1999.
[59]
Q. M. Nguyen and D. Sanchez, "Fifer: Practical acceleration of irregular applications on reconfigurable architectures," in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021, pp. 1064--1077.
[60]
C. Nicol, "A coarse grain reconfigurable array (CGRA) for statically scheduled data flow computing," WaveComputing WhitePaper, 2017.
[61]
R. S. Nikhil et al., "Executing a program on the mit tagged-token dataflow architecture," IEEE Transactions on computers, vol. 39, no. 3, 1990.
[62]
T. Nowatzki, N. Ardalani, K. Sankaralingam, and J. Weng, "Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign," in PACT 27, 2018.
[63]
T. Nowatzki, N. Ardalani, K. Sankaralingam, and J. Weng, "Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign," in Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '18. New York, NY, USA: ACM, 2018, pp. 36:1--36:15. [Online].
[64]
T. Nowatzki, V. Gangadhar, N. Ardalani, and K. Sankaralingam, "Stream-dataflow acceleration," in ISCA 44, 2017.
[65]
T. Nowatzki, V. Gangadhar, and K. Sankaralingam, "Exploring the potential of heterogeneous von neumann/dataflow execution models," in Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015, pp. 298--310.
[66]
T. Nowatzki, V. Gangadhar, K. Sankaralingam, and G. Wright, "Domain specialization is generally unnecessary for accelerators," IEEE Micro, vol. 37, no. 3, 2017.
[67]
T. Nowatzki, M. Sartin-Tarm, L. De Carli, K. Sankaralingam, C. Estan, and B. Robatmili, "A general constraint-centric scheduling framework for spatial architectures," ACM SIGPLAN Notices, vol. 48, no. 6, 2013.
[68]
N. Ozaki, Y. Yasuda, M. Izawa, Y. Saito, D. Ikebuchi, H. Amano, H. Nakamura, K. Usami, M. Namiki, and M. Kondo, "Cool mega-arrays: Ultralow-power reconfigurable accelerator chips," IEEE Micro, vol. 31, no. 6, 2011.
[69]
J. Pager, R. Jeyapaul, and A. Shrivastava, "A software scheme for multithreading on cgras," ACM Transactions on Embedded Computing Systems (TECS), vol. 14, no. 1, pp. 1--26, 2015.
[70]
G. M. Papadopoulos and D. E. Culler, "Monsoon: An explicit token-store architecture," SIGARCH Comput. Archit. News, vol. 18, no. 2SI, p. 82--91, may 1990. [Online].
[71]
A. Parashar, M. Pellauer, M. Adler, B. Ahsan, N. Crago, D. Lustig, V. Pavlov, A. Zhai, M. Gambhir, A. Jaleel et al., "Triggered instructions: a control paradigm for spatially-programmed architectures," ACM SIGARCH Computer Architecture News, vol. 41, no. 3, 2013.
[72]
H. Park, Y. Park, and S. Mahlke, "Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications," in Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO 42. New York, NY, USA: Association for Computing Machinery, 2009, p. 370--380. [Online].
[73]
P. M. Phothilimthana, T. Jelvis, R. Shah, N. Totla, S. Chasins, and R. Bodik, "Chlorophyll: Synthesis-aided compiler for low-power spatial architectures," SIGPLAN Not., vol. 49, no. 6, p. 396--407, jun 2014. [Online].
[74]
R. Prabhakar, Y. Zhang, D. Koeplinger, M. Feldman, T. Zhao, S. Hadjis, A. Pedram, C. Kozyrakis, and K. Olukotun, "Plasticine: A reconfigurable architecture for parallel patterns," in ISCA 44, 2017.
[75]
A. Rucker, M. Vilim, T. Zhao, Y. Zhang, R. Prabhakar, and K. Olukotun, "Capstan: A vector rda for sparsity," 2021.
[76]
K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore, "Exploiting ilp, tlp, and dlp with the polymorphous trips architecture," in ISCA 30, 2003.
[77]
K. Sankaralingam, T. Nowatzki, G. Wright, P. Palamuttam, J. Khare, V. Gangadhar, and P. Shah, "Mozart: Designing for software maturity and the next paradigm for chip architectures," in IEEE Hot Chips 33 Symposium, HCS 2021, Palo Alto, CA, USA, August 22--24, 2021. IEEE, 2021, pp. 1--20. [Online].
[78]
M. Satyanarayanan, N. Beckmann, G. A. Lewis, and B. Lucia, "The role of edge offload for hardware-accelerated mobile devices," in Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications, ser. HotMobile '21. New York, NY, USA: Association for Computing Machinery, 2021, p. 22--29. [Online].
[79]
M. Satyanarayanan, N. Beckmann, G. A. Lewis, and B. Lucia, "The role of edge offload for hardware-accelerated mobile devices," in HotMobile, 2021.
[80]
H. Singh, M.-H. Lee, G. Lu, F. Kurdahi, N. Bagherzadeh, and E. Chaves Filho, "Morphosys: an integrated reconfigurable system for data-parallel and computation-intensive applications," IEEE Transactions on Computers, vol. 49, no. 5, pp. 465--481, 2000.
[81]
P. Sparks, "A route to a trillion devices," Arm WhitePaper, 2017.
[82]
A. K. Sujeeth, K. J. Brown, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun, "Delite: A compiler architecture for performance-oriented embedded domain-specific languages," ACM Transactions on Embedded Computing Systems (TECS), vol. 13, no. 4s, pp. 1--25, 2014.
[83]
S. Swanson, K. Michelson, A. Schwerin, and M. Oskin, "Wavescalar," in MICRO 36, 2003.
[84]
C. Tan, M. Karunaratne, T. Mitra, and L.-S. Peh, "Stitch: Fusible heterogeneous accelerators enmeshed with many-core architecture for wearables," in ISCA 45, 2018.
[85]
C. Tan, C. Xie, A. Li, K. J. Barker, and A. Tumeo, "Opencgra: An open-source unified framework for modeling, testing, and evaluating cgras," in 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE, 2020, pp. 381--388.
[86]
F. Tavares, "Kicksat 2," May 2019. [Online]. Available: https://www.nasa.gov/ames/kicksat
[87]
M. B. Taylor, "Is dark silicon useful? harnessing the four horsemen of the coming dark silicon apocalypse," in DAC, 2012.
[88]
C. Torng and P. Pan, "Ue-cgra hpca 2021 artifact," Mar 2021. [Online]. Available: https://github.com/cornell-brg/torng-uecgra-scripts-hpca2021
[89]
C. Torng, P. Pan, Y. Ou, C. Tan, and C. Batten, "Ultra-elastic cgras for irregular loop specialization," in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2021, pp. 412--425.
[90]
N. Vedula, A. Shriraman, S. Kumar, and W. N. Sumner, "Nachos: Software-driven hardware-assisted memory disambiguation for accelerators," in 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018, pp. 710--723.
[91]
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation cores: reducing the energy of mature computations," in ACM SIGARCH Computer Architecture News, vol. 38, no. 1, 2010.
[92]
M. Vilim, A. Rucker, Y. Zhang, S. Liu, and K. Olukotun, "Gorgon: Accelerating machine learning from relational data," in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 2020, pp. 309--321.
[93]
D. Voitsechov and Y. Etsion, "Single-graph multiple flows: Energy efficient design alternative for gpgpus," ACM SIGARCH computer architecture news, vol. 42, no. 3, 2014.
[94]
D. Voitsechov, O. Port, and Y. Etsion, "Inter-thread communication in multithreaded, reconfigurable coarse-grain arrays," in MICRO 51, 2018.
[95]
E. Waingold et al., "Baring It All to Software: Raw Machines," in IEEE Computer, September 1997.
[96]
B. Wang, M. Karunarathne, A. Kulkarni, T. Mitra, and L.-S. Peh, "Hycube: A 0.9 v 26.4 mops/mw, 290 pj/op, power efficient accelerator for iot applications," in 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC). IEEE, 2019, pp. 133--136.
[97]
J. Weng, S. Liu, V. Dadu, Z. Wang, P. Shah, and T. Nowatzki, "Dsagen: synthesizing programmable spatial accelerators," in ISCA 47, 2020.
[98]
J. Weng, S. Liu, Z. Wang, V. Dadu, and T. Nowatzki, "A hybrid systolic-dataflow architecture for inductive matrix algorithms," in HPCA, 2020.
[99]
D. Wijerathne, Z. Li, A. Pathania, T. Mitra, and L. Thiele, "Himap: Fast and scalable high-quality mapping on cgra via hierarchical abstraction," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021.
[100]
L. Wu, A. Lottarini, T. K. Paine, M. A. Kim, and K. A. Ross, "Q100: The architecture and design of a database processing unit," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS '14. New York, NY, USA: ACM, 2014, pp. 255--268. [Online].
[101]
Y. Yang, J. S. Emer, and D. Sanchez, "Spzip: architectural support for effective data compression in irregular applications," in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021, pp. 1069--1082.
[102]
Z. Zhao, W. Sheng, Q. Wang, W. Yin, P. Ye, J. Li, and Z. Mao, "Towards higher performance and robust compilation for cgra modulo scheduling," IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 9, pp. 2201--2219, 2020.

Cited By

View all
  • (2024)FlowCert: Translation Validation for Asynchronous Dataflow via Dynamic Fractional PermissionsProceedings of the ACM on Programming Languages10.1145/36897298:OOPSLA2(499-526)Online publication date: 8-Oct-2024
  • (2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
  • (2024)The Internet of Batteryless ThingsCommunications of the ACM10.1145/362471867:3(64-73)Online publication date: 22-Feb-2024
  • Show More Cited By

Index Terms

  1. RipTide: A Programmable, Energy-Minimal Dataflow Compiler and Architecture
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture
      October 2022
      1498 pages
      ISBN:9781665462723

      Sponsors

      Publisher

      IEEE Press

      Publication History

      Published: 18 December 2023

      Check for updates

      Author Tags

      1. energy-minimal
      2. ultra-low-power
      3. programmable
      4. general-purpose
      5. reconfigurable
      6. CGRA
      7. dataflow
      8. compiler

      Qualifiers

      • Research-article

      Conference

      MICRO '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 484 of 2,242 submissions, 22%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)98
      • Downloads (Last 6 weeks)35
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)FlowCert: Translation Validation for Asynchronous Dataflow via Dynamic Fractional PermissionsProceedings of the ACM on Programming Languages10.1145/36897298:OOPSLA2(499-526)Online publication date: 8-Oct-2024
      • (2024)R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRAACM Transactions on Reconfigurable Technology and Systems10.1145/365664217:2(1-34)Online publication date: 8-Apr-2024
      • (2024)The Internet of Batteryless ThingsCommunications of the ACM10.1145/362471867:3(64-73)Online publication date: 22-Feb-2024
      • (2024)Enabling HW-Based Task Scheduling in Large Multicore ArchitecturesIEEE Transactions on Computers10.1109/TC.2023.332378173:1(138-151)Online publication date: 1-Jan-2024
      • (2023)Pipestitch: An energy-minimal dataflow architecture with lightweight threadsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614283(1409-1422)Online publication date: 28-Oct-2023
      • (2023)Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow PlaneProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614246(1395-1408)Online publication date: 28-Oct-2023
      • (undefined)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/3695880

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media