Improved flop tray-based design implementation for power reduction

AB Kahng, J Li, L Wang - 2016 IEEE/ACM International …, 2016 - ieeexplore.ieee.org
2016 IEEE/ACM International Conference on Computer-Aided Design …, 2016ieeexplore.ieee.org
Clock network power reduction is critical in modern SoC designs. Application of flop trays
(ie, multi-bit flip-flops) can significantly reduce the number of sinks in a clock network, and
thus reduce the number of clock buffers, clock wirelength, and clock network power. Shared
inverters within flop trays also reduce power at the flip-flop level. However, large-size flop
trays typically induce placement and routing congestion, and impose additional placement
constraints on their fanin/fanout logic cones; this results in power overheads on datapaths …
Clock network power reduction is critical in modern SoC designs. Application of flop trays (i.e., multi-bit flip-flops) can significantly reduce the number of sinks in a clock network, and thus reduce the number of clock buffers, clock wirelength, and clock network power. Shared inverters within flop trays also reduce power at the flip-flop level. However, large-size flop trays typically induce placement and routing congestion, and impose additional placement constraints on their fanin/fanout logic cones; this results in power overheads on datapaths. At the same time, to our knowledge, few previous works have studied flop trays with more than four bits. The “chicken-and-egg” loop between flop tray generation and placement optimization is another challenge to flop tray-based design [7]. In this work, we propose an optimization flow to generate and place flop trays from a library of arbitrary given sizes and aspect ratios (ARs), to achieve clock network power reduction. Our optimization starts with an initial placement solution using only single-bit flops. It then performs capacitated K-means clustering to generate solutions with different flop tray sizes and ARs. More specifically, we iteratively use (i) min-cost flow to cluster flops, and (ii) a linear programming-based optimization to determine locations of the generated flop trays. Last, we formulate an integer linear program to select the best combination of flop tray solutions (i.e., sizes and placements) with minimum displacement and number of isolated sinks. Our optimization is aware of flop tray sizes and ARs, as well as timing-critical start-end pairs. Results in foundry 28FDSOI technology show up to 32% total block power reduction as compared to designs using only single-bit flops, up to 16% total block power reduction over designs with flop trays generated by logical clustering during synthesis, and 13% clock power reduction on average compared to the previous work in [10].
ieeexplore.ieee.org