Wearout Resilience in Nocs Through An Aging Aware Adaptive Routing Algorithm
Wearout Resilience in Nocs Through An Aging Aware Adaptive Routing Algorithm
Wearout Resilience in Nocs Through An Aging Aware Adaptive Routing Algorithm
Wearout Resilience in NoCs Through an Aging We make the following contributions in this paper.
Aware Adaptive Routing Algorithm 1) We introduce our slack acquisition circuit (SAC), a
Dean Michael Ancajas, Kshitij Bhardwaj, crucial part in the design of aging-aware NoCs. The SAC
Koushik Chakraborty, and Sanghamitra Roy measures delay degradation and is combined with con-
gestion information to create a robust routing algorithm
Abstract— Continuous technology scaling has made aging with minimal power-performance overhead (Section IV).
mechanisms, such as negative bias temperature instability and 2) We analyze in more detail our previously proposed
electromigration primary concerns in network-on-chip (NoC) aging-aware adaptive routing algorithm [2] and its
designs. In this paper, we extensively analyze the effects of these required router microarchitecture. The adaptive routing
aging mechanisms on NoC routers and links. We observe a algorithm minimizes degradation of NoC routers while
critical need of a robust aging-aware routing algorithm that
not only reduces power-performance overheads caused due to mitigating power-performance impacts brought about
aging degradation, but also minimizes the stress experienced by aging. We include new simulation results of mean
by heavily utilized routers and links. To solve this problem, time to failure (MTTF) and aging uniformity of router
we propose an aging-aware adaptive routing algorithm and ages.
a router microarchitecture that routes the packets along the 3) An extensive experimental analysis using a top-down
paths, which are both least congested and experience minimum
aging degradation. After an extensive experimental analysis using simulation infrastructure and real workloads (PARSEC
real workloads, we observe 13% and 12.17% average overhead benchmarks [3]) indicates an average of 13% and
reduction in network latency and energy–delay product per flit, 12.17% reduction in the network latency and energy–
a 10.4% improvement in performance, and a 60% improve- delay product per flit (EDPPF) [4] in a typical NoC
ment in mean time to failure using our aging-aware routing undergoing aging degradation. We also obtain an aver-
algorithm.
age improvement of 10.4% in performance using our
Index Terms— Multicore processing, multiprocessor proposed algorithm (Section VI).
interconnection networks, parallel architecture.
TABLE I
D IFFERENT D EGRADATION S CHEMES
Fig. 2. Time taken for the network to become faulty under various aging
models (high injection rate).
III. M OTIVATION
In this section, we motivate the need for using appropriate
aging mechanisms in the analysis of NoC degradation. We
first present a robustness analysis using the network latency Fig. 3. SAC. The paths are sampled through a series of delay buffers. The
of an NoC with respect to different aging mechanisms. Then, slack information is then combined with congestion information to calculate
TTpE.
we discuss the performance implications of a degraded NoC.
B. Performance Implications of Degraded NoCs combined with congestion information, is then used by our
routing algorithm to decide the optimal path of a packet.
A degraded network can have huge side effects in the
To be able to guide the aging-aware routing algorithms,
overall performance of running applications. Fig. 1 shows the
the SAC measures the increase in delay of each router. The
performance hit taken by applications when the flits take 20%
SAC is placed in every stage of the router pipeline (Fig. 4)
more time to route through the network. This experiment is
to capture the minimum slack amongst all stages. A more
done on the same 4 × 4 mesh configuration with PARSEC
detailed diagram of the SAC is shown in Fig. 3. The control
benchmark suite. The disparity in performance will exacerbate
unit shown in Fig. 3 alters the multiplexer select signal in each
as the size of the network grows.
cycle to choose which path to measure. Then, a series of m
cascaded delay buffers (db1, db2 , …, dbm ) sample the signal
IV. S LACK ACQUISITION C IRCUIT at equal time intervals. The state transition captured at the
In this section, we discuss our SAC, a crucial part in output of each delay buffer provides an estimate of the delay
the design of aging-aware NoCs. The SAC measures the of the path. Finally, a comparator and the TTpE module [2]
delay degradation of an NoC router. This delay degradation, obtains the minimum slack after n cycles.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
The calculation of the router slack is formally defined as Algorithm 1: Aging-Aware Adaptive Routing
follows:
Srouter = min(s1 , s2 , . . . , s N ) (1)
sn = min(s p1 , s p2 , . . . , s p M ) (2)
1
spath = − cp (3)
f
where cp and f are the delay at the time of sampling and
the target frequency of the router, respectively. Srouter is the
overall slack of the router, sn is the overall slack of each
stage, and spath is the slack of each path. The SAC circuit
samples the path delay of the incoming link and the router
buffers. Incoming links are included in the first stage of the
router. To determine overall minimum slack, the SAC uses a
multiplexer to sample different signal paths.
1) Process Variation Issues: In smaller scale technologies,
process variation can cause the delay buffers to have differing
delay resolutions after manufacturing. This can be mitigated
by postsilicon tuning exploiting a programmable buffer series
[18]. After manufacturing of the ICs, the buffer series can then TABLE II
be divided into multiple stages so that each stage will have the H ARDWARE OVERHEADS
same delay resolution.
To calculate the overhead of our SAC, we implement
its Verilog RTL model and integrate it with a well-known
NoC router model [19]. Our implementation shows 1.2% and
0.047% area and power overheads using a 45-nm TSMC
technology targeted at 2 GHz, respectively. For timing, the
SAC decreases the overall slack by 8.6% as the timing delay
buffers are distributed appropriately to cover the whole clock this by dividing the routing unit into two stages. The first
period. However, this slack decrease is only experienced at the stage selects the output link based on the policies discussed
sampling rate (1 out of 1010 times at 2 GHz), and hence does in Section V-A. In the second stage, the utilization of the
not affect the original performance of the router. selected output link is evaluated. If it is determined that it
has reached its provisioned traffic in the current epoch then
V. AGING -AWARE A DAPTIVE ROUTING the router will insert recovery cycles, otherwise the flits are
In this section, we present a robust aging-aware routing forwarded immediately.
algorithm. Our algorithm reduces the stress while adding
minimal power-performance overheads.
VI. E XPERIMENTAL R ESULTS
Fig. 5. Latency, EDPPF, and performance (lower is better). (a) Network latency. (b) EDPPF. (c) Performance.
Fig. 6. Reliability (higher is better), MTTF (higher is better), and slack standard deviation (lower is better). (a) Reliability. (b) MTTF. (c) Slack standard
deviation.
Fig. 5(a) shows the network latency of all schemes. RCA- and links. Using the TTpE metric, we propose an aging-
1D has worst latency because it does not implement any aging aware adaptive routing algorithm and router microarchitec-
awareness. RCA-1D only repeatedly selects paths that are least ture. Extensive experimental analysis incorporating power-
congested, which will heavily degrade as time goes by leading performance impact of aging demonstrate 13% and 12.17%
to higher packet latency. On the other hand, AGE-ADAP and improvements in network latencies and EDPPF for our algo-
AGE-ADAP-REC route packets intelligently using paths that rithm, respectively. At the system level, our algorithm shows
are least degraded and least congested, and therefore incur 10.4% performance improvement for real workloads.
lower overheads. On an average, AGE-ADAP-REC reduces
the latency by 13% relative to RCA-1D due to recovery cycle R EFERENCES
insertion. [1] J. D. Owens, W. J. Dally, R. Ho, D. N. Jayasimha, S. W. Keckler, and
1) EDPPF: We show EDPPF overhead in Fig. 5(b). AGE- L.-S. Peh, “Research challenges for on-chip interconnection networks,”
IEEE Micro, vol. 27, no. 5, pp. 96–108, Oct. 2007.
ADAP and AGE-ADAP-REC are able to achieve reduced
[2] K. Bhardwaj, K. Chakraborty, and S. Roy, “Towards graceful aging
overhead as compared with RCA-1D due to better latency and degradation in NoCs through an adaptive routing algorithm,” in Proc.
power profile. Due to additional recovery cycles, AGE-ADAP- 49th IEEE DAC, Jun. 2012, pp. 382–391.
REC incurs a higher EDPPF as compared with AGE-ADAP. [3] (2010). PARSEC [Online]. Available: http://parsec.cs.princeton.edu/
[4] B. Li, L.-S. Peh, and P. Patra, “Impact of process and temperature
On an average, AGE-ADAP-REC reduces EDPPF by 12.17% variations on network-on-chip design exploration,” in Proc. 2nd IEEE
relative to RCA-1D. Int. Symp. Netw. Chip, Apr. 2008, pp. 117–126.
2) Performance: Fig. 5(c) shows the system performance [5] S. Borkar, “Thousand core chips: A technology perspective,” in Proc.
for the schemes relative to RCA-1D. We observe that RCA- 44th Annu. DAC, 2007, pp. 746–749.
[6] C.-L. Chou and R. Marculescu, “Farm: Fault-aware resource manage-
1D shows lower performance as compared with AGE-ADAP ment in NoC-based multiprocessor platforms,” in Proc. DATE Conf.,
and AGE-ADAP-REC schemes. Since RCA-1D is only con- 2011, pp. 673–678.
gestion aware, it selects the least congested paths over the [7] F. Chaix, D. Avresky, N.-E. Zergainoh, and M. Nicolaidis, “A fault-
least degraded ones, and therefore incurs higher performance tolerant deadlock-free adaptive routing for on chip interconnects,” in
Proc. DATE Conf., 2011, pp. 909–912.
overheads at the system level. Across different benchmarks, [8] S. Pasricha, Y. Zou, D. Connors, and H. J. Siegel, “OE+IOE: A novel
AGE-ADAP-REC shows 10.4% performance improvement turn model based fault tolerant routing scheme for networks-on-chip,”
over RCA-1D, demonstrating its effectiveness. in Proc. 8th Int. Conf. Hardw./Softw. Codes. Syst. Synth., Oct. 2010,
pp. 85–94.
[9] S. Murali, D. Atienza, L. Benini, and G. D. Micheli, “A multi-path
F. System MTTF routing strategy with guaranteed in-order packet delivery and fault-
tolerance for networks on chip,” in Proc. 43rd DAC, 2006, pp. 845–848.
Fig. 6(b) shows a comparison of the lifetime using each [10] S. Manolache, P. Eles, and Z. Peng, “Fault and energy-aware commu-
of our proposed routing schemes. The values are normalized nication mapping with guaranteed latency for applications implemented
with respect to RCA-1D. Our two schemes are comparable on NoC,” in Proc. 42nd DAC, Jun. 2005, pp. 266–269.
with each other, and each extends the system lifetime when [11] P. Bogdan and R. Marculescu, “Hitting time analysis for fault-tolerant
communication at nanoscale in future multiprocessor platforms,” IEEE
compared to the baseline by 53% and 60%, on average. The Trans. Comput. Aided Des., vol. 30, no. 8, pp. 1197–1210, Aug. 2011.
largest improvement is on benchmark facesim that is 84%, [12] D. Fick, A. DeOrio, G. K. Chen, V. Bertacco, D. Sylvester, and
while the smallest is on ferret at 10%. By limiting the amount D. Blaauw, “A highly resilient routing algorithm for fault-tolerant NoCs,”
in Proc. DATE Conf., pp. 21–26, Apr. 2009.
of traffic on the routers based on the TTpE, both of our
[13] W.-C. Tsai, D.-Y. Zheng, S.-J. Chen, and Y. H. Hu, “A fault-tolerant
schemes are able to reduce temperature and consequently NoC scheme using bidirectional channel,” in Proc. 48th DAC, 2011,
aging, thereby extending the lifetime of routers that are about pp. 918–923.
to fail. AGE-ADAP-REC has slightly better MTTF values due [14] Z. Shi, X. Zeng, and Z. Yu, “A scalable and reconfigurable fault-tolerant
distributed routing algorithm for NoCs,” IEICE Trans., vol. 94-D, no. 7,
to stricter adherent toward the TTpE values. pp. 1386–1397, Jun. 2011.
[15] R. Parikh and V. Bertacco, “Formally enhanced runtime verification to
G. Aging Variation ensure NoC functional correctness,” in Proc. 44th Annu. IEEE/ACM Int.
Symp. Microarchit., Dec. 2011, pp. 410–419.
Another way to evaluate the effectiveness of various relia- [16] A. Prodromou, A. Panteli, C. Nicopoulos, and Y. Sazeides, “Nocalert:
bility schemes is to check the slack difference of all routers in An on-line and real-time fault detection mechanism for network-on-chip
architectures,” in Proc. 45th Annu. IEEE/ACM Int. Symp. Microarchit.,
the network. Ideally, a reliable system has all of its components Dec. 2012, pp. 60–71.
fail at the same time rather than having one component break [17] A. Sharifi and M. T. Kandemir, “Process variation-aware routing in NoC
early taking down the whole system with it. Fig. 6(c) shows based multicores,” in Proc. 48th DAC, 2011, pp. 924–929.
the standard deviation of all router slacks after N years. All [18] K. A. Bowman, J. W. Tschanz, S.-L. Lu, P. A. Aseron, M. M. Khellah,
A. Raychowdhury, et al., “A 45 nm resilient microprocessor core for
values are scaled with respect to RCA-1D. As RCA-1D only dynamic variation tolerance,” J. Solid-State Circuits, vol. 46, no. 1,
routes flits based on congestion, the router slacks have so much pp. 194–208, Jan. 2011.
variation. Our schemes reduce this variation by introducing [19] (2012). Open Source NoC Router RTL [Online]. Available:
reliability awareness. AGE-ADAP-REC improves the unifor- https://nocs.stanford.edu/cgi-bin/trac.cgi/wiki/Resources/Router
[20] E. Karl, P. Singh, D. Blaauw, and D. Sylvester, “Compact
mity of router slacks by 49%, while AGE-ADAP by only 42%. in-situ sensors for monitoring negative-bias-temperature-instability
effect and oxide degradation,” in Proc. IEEE ISSCC, Feb. 2008,
VII. C ONCLUSION pp. 410–623.
[21] P. Gratz, B. Grot, and S. W. Keckler, “Regional congestion awareness
In this paper, we comprehensively analyze the effects for load balance in networks-on-chip,” in Proc. IEEE 14th Int. Symp.
of multiple aging degradation mechanisms on NoC routers HPCA, Feb. 2008, pp. 203–214.