RAMP: Resource-aware mapping for CGRAs

S Dave, M Balasubramanian… - Proceedings of the 55th …, 2018 - dl.acm.org
Proceedings of the 55th Annual Design Automation Conference, 2018dl.acm.org
Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate
even non-parallel loops. Acceleration achieved through CGRAs critically depends on the
goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the
compiler's ability to route the dependencies among operations. Previous works have
explored several mechanisms to route data dependencies, including, routing through other
PEs, registers, memory, and even re-computation. All these routing options change the …
Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler's ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.
ACM Digital Library