Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Methods for fault tolerance in networks-on-chip

Published: 11 July 2013 Publication History

Abstract

Networks-on-Chip constitute the interconnection architecture of future, massively parallel multiprocessors that assemble hundreds to thousands of processing cores on a single chip. Their integration is enabled by ongoing miniaturization of chip manufacturing technologies following Moore's Law. It comes with the downside of the circuit elements' increased susceptibility to failure. Research on fault-tolerant Networks-on-Chip tries to mitigate partial failure and its effect on network performance and reliability by exploiting various forms of redundancy at the suitable network layers. The article at hand reviews the failure mechanisms, fault models, diagnosis techniques, and fault-tolerance methods in on-chip networks, and surveys and summarizes the research of the last ten years. It is structured along three communication layers: the data link, the network, and the transport layers. The most important results are summarized and open research problems and challenges are highlighted to guide future research on this topic.

References

[1]
Agarwal, M., Paul, B., Zhang, M., and Mitra, S. 2007. Circuit failure prediction and its application to transistor aging. In Proceedings of the 25th IEEE VLSI Test Symposium. 277--286.
[2]
Aisopos, K., Chen, C.-H., and Peh, L.-S. 2011a. Enabling system-level modeling of variation-induced faults in networks-on-chips. In Proceedings of the 48th ACM/EDAC/IEEE Design Automation Conference (DAC'11). 930--935.
[3]
Aisopos, K., Deorio, A., Peh, L.-S., and Bertacco, V. 2011b. Ariadne: Agnostic reconfiguration in a disconnected network environment. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'11). 298--309.
[4]
Alaghi, A., Karimi, N., Sedghi, M., and Navabi, Z. 2007. Online noc switch fault detection and diagnosis using a high level fault model. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07). 21--29.
[5]
Alaghi, A., Sedghi, M., Karimi, N., Fathy, M., and Navabi, Z. 2008. Reliable noc architecture utilizing a robust rerouting algorithms. In 9th IEEE East-West Design and Test Symposium (EWDTS'08).
[6]
Ali, M., Welzl, M., and Hellebrand, S. 2005. A dynamic routing mechanism for network on chip. In Proceedings of the 23rd NORCHIP Conference. 70--73.
[7]
Ali, M., Welzl, M., and Hessler, S. 2007. And end 2 end reliability protocol to address transient faults in network on chips. In Digest of the Workshop on Diagnostic Services in Network-on-Chips.
[8]
Anghel, L. and Nicolaidis, M. 2000. Cost reduction and evaluation of a temporary faults detecting technique. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 591--598.
[9]
Angiolini, F., Meloni, P., Carta, S., Benini, L., and Raffo, L. 2006. Contrasting a noc and a traditional interconnect fabric with layout awareness. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'06). Vol. 1. 1--6.
[10]
Avizienis, A., Laprie, J.-C., Randell, B., and Landwehr, C. 2004. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secure Comput. 1, 1, 11--33.
[11]
Baumann, R. 2005. Soft errors in advanced computer systems. IEEE Des. Test Comput. 22, 3, 258--266.
[12]
Bell, S., Edwards, B., Amann, J., Conlin, R., Joyce, K., Leung, V., Mackay, J., Reif, M., Bao, L., Brown, J., Mattina, M., Miao, C.-C., Ramey, C., Wentzlaff, D., Anderson, W., Berger, E., Fairbanks, N., Khan, D., Montenegro, F., Stickney, J., and Zook, J. 2008. TILE64 processor: A 64-core SoC with mesh interconnect. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC'08). 87--90.
[13]
Bertozzi, D., Benini, L., and De Micheli, G. 2002. Low power error resilient encoding for on-chip data buses. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 102--109.
[14]
Bertozzi, D., Benini, L., and De Micheli, G. 2005. Error control schemes for on-chip communication links: The energy reliability tradeoff. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 24, 6, 818--831.
[15]
Bjerregaard, T. and Mahadevan, S. 2006. A survey of research and practices of network-on-chip. ACM Comput. Surv. 38, 1--51.
[16]
Bobda, C., Ahmadinia, A., Majer, M., Teich, J., Fekete, S., and Van Der Veen, J. 2005. Dynoc: A dynamic infrastructure for communication in dynamically reconfigurable devices. In Proceedings of the International Field Programmable Logic and Applications Conference. 153--158.
[17]
Bogdan, P., Dumitras, T., and Marculescu, R. 2007. Stochastic communication: A new paradigm for fault-tolerant networks-on-chip. VLSI Des. 2007, 1--17.
[18]
Bolotin, E., Cidon, I., Ginosar, R., and Kolodny, A. 2007. Routing table minimization for irregular mesh nocs. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'07). 1--6.
[19]
Bondavalli, A., Chiaradonna, S., Giandomenico, F. D., and Grandoni, F. 2000. Threshold-based mechanisms to discriminate transient from intermittent faults. IEEE Trans. Comput. 49, 4, 230--245.
[20]
Boppana, R. V. and Chalasani, S. 1995. Fault-tolerant wormhole routing algorithms for mesh networks. IEEE Trans. Comput. 44, 7, 848--864.
[21]
Borkar, S. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25, 6, 10--16.
[22]
Borkar, S. 2007. Thousand core chips: A technology perspective. In Proceedings of the 44th Annual Design Automation Conference (DAC'07). ACM Press, New York, 746--749.
[23]
Boyan, J. and Littman, M. 1994. Packet routing in dynamically changing networks: A reinforcement learning approach. Adv. Neural Inf. Process. Syst. 6, 671--678.
[24]
Breuer, M., Gupta, S., and Mak, T. 2004. Defect and error tolerance in the presence of massive numbers of defects. IEEE Des. Test Comput. 21, 3, 216--227.
[25]
Chen, C.-L. and Chiu, G.-M. 2001. A fault-tolerant routing scheme for meshes with nonconvex faults. IEEE Trans. Parallel Distrib. Syst. 12, 5, 467--475.
[26]
Concatto, C., Matos, D., Carro, L., Kastensmidt, F., Susin, A., Cota, E., and Kreutz, M. 2009. Fault tolerant mechanism to improve yield in nocs using a reconfigurable router. In Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design (SBCCI'09). ACM Press, New York, 1--6.
[27]
Constantinescu, C. 2003. Trends and challenges in vlsi circuit reliability. IEEE Micro 23, 4, 14--19.
[28]
Constantinides, K., Plaza, S., Blome, J., Zhang, B., Bertacco, V., Mahlke, S., Austin, T., and Orshansky, M. 2006. Bulletproof: A defect-tolerant cmp switch architecture. In Proceedings of the 12th International High-Performance Computer Architecture Symposium. 5--16.
[29]
Cota, E., Kastensmidt, F., Cassel, M., Herve, M., Almeida, P., Meirelles, P., Amory, A., and Lubaszewski, M. 2008. A high-fault-coverage approach for the test of data, control and handshake interconnects in mesh networks-on-chip. IEEE Trans. Comput. 57, 9, 1202--1215.
[30]
Cuviello, M., Dey, S., Bai, X., and Zhao, Y. 1999. Fault modeling and simulation for crosstalk in system-on-chip interconnects. In IEEE/ACM International Digest of Technical Papers on Computer-Aided Design. 297--303.
[31]
Dalirsani, A., Holst, S., Elm, M., and Wunderlich, H. 2011. Structural test for graceful degradation of noc switches. In Proceedings of the European Test Symposium (ETS'11). 183--188.
[32]
De Micheli, G. and Benini, L. 2006. Networks On Chips: Technology and Tools. Morgan Kaufmann Publishers.
[33]
Dodd, P. and Massengill, L. 2003. Basic mechanisms and modeling of single-event upset in digital microelectronics. IEEE Trans. Nuclear Sci. 50, 3, 583--602.
[34]
Duan, X., Zhang, D., and Sun, X. 2009. Fault-tolerant routing schemes for wormhole mesh. In Proceedings of the IEEE International Parallel and Distributed Processing with Applications Symposium. 298--301.
[35]
Duato, J., Lysne, O., Pang, R., and Pinkston, T. 2005. Part i: A theory for deadlock-free dynamic network reconfiguration. IEEE Trans. Parallel Distrib. Syst. 16, 5, 412--427.
[36]
Dubrova, E. 2008. Fault-Tolerant Design: An Introduction. Kluwer Academic Publishers.
[37]
Dumitras, T. and Marculescu, R. 2003. On-chip stochastic communication {soc applications}. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'03). 790--795.
[38]
Dutta, A. and Touba, N. 2007. Reliable network-on-chip using a low cost unequal error protection code. In Proceedings of the 22nd IEEE International Defect and Fault-Tolerance in VLSI Systems Symposium (DFT'07). 3--11.
[39]
Eghbal, A., Yaghini, P. M., Pedram, H., and Zarandi, H. R. 2010. Designing fault-tolerant network-on-chip router architecture. Int. J. Electron. 97, 10, 1181--1192.
[40]
Ejlali, A., Al-Hashimi, B. M., Rosinger, P., and Miremadi, S. G. 2007. Joint consideration of fault-tolerance, energy efficiency and performance in on-chip networks. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'07). 647--1652.
[41]
Elakkumanan, P., Prasad, K., and Sridhar, R. 2006. Time redundancy based scan flip-flop reuse to reduce ser of combinational logic. In Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED'06). IEEE Computer Society, Los Alamitos, CA, 617--624.
[42]
Ernst, D., Kim, N. S., Das, S., Pant, S., Rao, R., Pham, T., Ziesler, C., Blaauw, D., Austin, T., Flautner, K., and Mudge, T. 2003. Razor: A low-power pipeline based on circuit-level timing speculation. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). 7--18.
[43]
Feng, C., Lu, Z., Jantsch, A., Li, J., and Zhang, M. 2010a. FoN: Fault-on-neighbor aware routing algorithm for networks-onchip. In International SOC Conference.
[44]
Feng, C., Lu, Z., Jantsch, A., Li, J., and Zhang, M. 2010b. A reconfigurable fault-tolerant deflection routing algorithm based on reinforcement learning for networks-on-chip. In Proceedings of the International Workshop on Network on Chip Architectures (NoCArc'10).
[45]
Fick, D., Deorio, A., Chen, G., Bertacco, V., Sylvester, D., and Blaauw, D. 2009a. A highly resilient routing algorithm for fault-tolerant nocs. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'09). 21--26.
[46]
Fick, D., Deorio, A., Hu, J., Bertacco, V., Blaauw, D., and Sylvester, D. 2009b. Vicis: A reliable network for unreliable silicon. In Proceedings of the 46th Annual Design Automation Conference (DAC'09). ACM Press, New York, 812--817.
[47]
Fiorin, L., Micconi, L., and Sami, M. 2011. Design of fault tolerant network interfaces for nocs. In Proceedings of the 14th Euromicro Conference on Digital System Design. 393--400.
[48]
Flich, J., Mejia, A., Lopez, P., and Duato, J. 2007. Region-based routing: An efficient routing mechanism to tackle unreliable hardware in network on chips. In Proceedings of the Symposium on Networks-on-Chip (NOCS'07). 183--194.
[49]
Flich, J., Skeie, T., Mejia, A., Lysne, O., Lopez, P., Robles, A., Duato, J., Koibuchi, M., Rokicki, T., and Sancho, J. 2012. A survey and evaluation of topology-agnostic deterministic routing algorithms. IEEE Trans. Parallel Distrib. Syst. 23, 3, 405--425.
[50]
Forney, G. D. 1973. The viterbi algorithm. Proc. IEEE 61, 3, 268--278.
[51]
Frantz, A., Kastensmidt, F., Carro, L., and Cota, E. 2006a. Dependable network-on-chip router able to simultaneously tolerate soft errors and crosstalk. In Proceedings of the IEEE International Test Conference (ITC'06). 1--9.
[52]
Frantz, A. P., Cassel, M., Kastensmidt, F. L., Cota, E., and Carro, L. 2007. Crosstalk- and seu-aware networks on chips. IEEE Des. Test Comput. 24, 4, 340--350.
[53]
Frantz, A. P., Kastensmidt, F. L., Carro, L., and Cota, E. 2006b. Evaluation of seu and crosstalk effects in network-on-chip switches. In Proceedings of the Symposium on Integrated Circuits and Systems Design (SBCCI'06).
[54]
Fu, B. and Ampadu, P. 2009. On hamming product codes with type-ii hybrid arq for on-chip interconnects. IEEE Trans. Circ. Syst. I: Regular Papers 56, 9, 2042--2054.
[55]
Fukushima, Y., Fukushi, M., and Horiguchi, S. 2009. Fault-tolerant routing algorithm for network on chip without virtual channels. In Proceedings of the 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'09). 313--321.
[56]
Furber, S. 2006. Living with failure: Lessons from nature? In Proceedings of the European Test Symposium (ETS'06). 4--8.
[57]
Gadlage, M., Ahlbin, J., Narasimham, B., Bhuva, B., Massengill, L., Reed, R., Schrimpf, R., and Vizkelethy, G. 2010. Scaling trends in set pulse widths in sub-100 nm bulk cmos processes. IEEE Trans. Nuclear Sci. 57, 6, 3336--3341.
[58]
Ganguly, A., Pande, P. P., and Belzer, B. 2009. Crosstalk-aware channel coding schemes for energy efficient and reliable noc interconnects. IEEE Trans. Very Large Scale Inter Syst. 17, 11, 1626--1639.
[59]
Ganguly, A., Pande, P. P., Belzer, B., and Grecu, C. 2007. Addressing signal integrity in networks on chip interconnects through crosstalk-aware double error correction coding. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'07). 317--324.
[60]
Gizopoulos, D., Psarakis, M., Adve, S. V., Ramachandran, P., Hari, S. K. S., Sorin, D., Biswas, A. M. A., and Vera, X. 2011. Architectures for online error detection and recovery in multicore processors. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'11).
[61]
Glass, C. J. and Ni, L. M. 1993. Fault-tolerant wormhole routing in meshes. In Proceedings of the 23rd International Fault-Tolerant Computing Digest of Papers Symposium (FTCS'93). 240--249.
[62]
Grecu, C., Ivanov, A., Pande, R., Jantsch, A., Salminen, E., Ogras, U., and Marculescu, R. 2007. Towards open network-on-chip benchmarks. In Proceedings of the 1st International Symposium on Networks-on-Chip (NOCS'07).
[63]
Grecu, C., Ivanov, A., Saleh, R., and Pande, P. P. 2006a. Noc interconnect yield improvement using crosspoint redundancy. In Proceedings of the 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'06). 457--465.
[64]
Grecu, C., Ivanov, A., Saleh, R., Sogomonyan, E., and Pande, P. P. 2006b. On-line fault detection and location for noc interconnects. In Proceedings of the 12th IEEE International On-Line Testing Symposium (IOLTS'06). 145--150.
[65]
Hazucha, P., Karnik, T., Maiz, J., Walstra, S., Bloechel, B., Tschanz, J., Dermer, G., Hareland, S., Armstrong, P., and Borkar, S. 2003. Neutron soft error rate measurements in a 90-nm cmos process and scaling trends in sram from 0.25-mu;m to 90-nm generation. In IEEE International Electron Devices Meeting Technical Digest (IEDM'03). 21.5.1--21.5.4.
[66]
Hegde, R. and Shanbhag, N. 2000. Toward achieving energy efficiency in presence of deep submicron noise. IEEE Trans. Syst. 8, 4, 379--391.
[67]
Hernandez, C., Federico, F., Santonja, V., and Duato, J. 2009. A new mechanism to deal with process variability in noc links. In Proceedings of the International Parallel and Distributed Processing Symposium (PDPS'09). 1--11.
[68]
Hoskote, Y., Vangal, S., Singh, A., Borkar, N., and Borkar, S. 2007. A 5-ghz mesh interconnect for a teraflops processor. IEEE Micro 27, 5, 51--61.
[69]
Hu, J. and Marculescu, R. 2004. Dyad - smart routing for networks-on-chip. In Proceedings of the 41st Design Automation Conference (DAC'04). 260--263.
[70]
Huffman, W. C. and Pless, V. 2003. Fundamentals of Error-Correcting Codes. Cambridge University Press.
[71]
INTEL LABS. 2010. The scc platform overview. Tech. rep. revision 0.7, Intel Corporation. http://www.intel.la/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-single-chip-platform-overview-paper.pdf.
[72]
ITRS. 2009. International technology roadmap for semiconductors. Tech. rep., ITRS Technology Working Group. http://www.itrs.net/Links/2009ITRS/2009Chapters_2009Tables/2009_Interconnect.pdf.
[73]
Jantsch, A., Lauter, R., and Vitkowski, A. 2005. Power analysis of link level and end-to-end data protection in networks on chip. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'05). Vol. 2. 1770--1773.
[74]
Jovanovic, S., Tanougast, C., Weber, S., and Bobda, C. 2009. A new deadlock-free fault-tolerant routing algorithm for noc interconnections. In Proceedings of the International Conference on Field Programmable Logic (FPL'09). 326--331.
[75]
Kakoee, M. R., Bertacco, V., and Benini, L. 2011a. A distributed and topology-agnostic approach for on-line noc testing. In Proceedings of the Network on Chip Symposium.
[76]
Kakoee, M. R., Bertacco, V., and Benini, L. 2011b. Relinoc: A reliable network for priority-based on-chip communication. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'11).
[77]
Keane, J. and Kim, C. 2011. An odometer for cpus. IEEE Spectrum 48, 5, 26--31.
[78]
Keane, J., Kim, T.-H., and Kim, C. H. 2007. An on-chip nbti sensor for measuring pmos threshold voltage degradation. In Proceedings of the International Symposium on Low Power Electronics and Design.
[79]
Kim, J., Nicopoulos, C., Park, D., Narayanan, V., Yousif, M. S., and Das, C. R. 2006. A gracefully degrading and energyefficient modular router architecture for on-chip networks. In Proceedings of the International Symposium on Computer Architecture (ISCA'06). 4--15.
[80]
Kim, J., Park, D., Nicopoulos, C., Vijaykrishnan, N., and Das, C. 2005. Design and analysis of an noc architecture from performance, reliability and energy perspective. In Proceedings of the Symposium on Architecture for Networking and Communications Systems (ANCS'05).
[81]
Kim, Y. B. and Kim, Y.-B. 2007. Fault tolerant source routing for network-on-chip. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07). 12--20.
[82]
Kohler, A. and Radetzki, M. 2009. Fault-tolerant architecture and deflection routing for degradable noc switches. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chips (NOCS'09). 22--31.
[83]
Kohler, A., Schley, G., and Radetzki, M. 2010. Fault tolerant network on chip switching with graceful performance degradation. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 20, 6, 883--896.
[84]
Koibuchi, M., Matsutani, H., Amano, H., and Pinkston, T. M. 2008. A lightweight fault-tolerant mechanism for networkon-chip. In Proceedings of the 2nd ACM/IEEE International Symposium on Networks-on-Chip (NoCS'08). 13--22.
[85]
Koupaei, F. K., Khademzadeh, A., and Janidarmian, M. 2011. Fault-tolerant application-specific network-on-chip. In Proceedings of the World Congress on Engineering and Computer Science.
[86]
Kuhn, K., Kenyon, C., Kornfeld, A., Liu, M., Maheshwari, A., Kai Shih, W., Sivakumar, S., Taylor, G., Vandervoorn, P., and Zawadzki, K. 2008. Managing process variation in Intel's 45nm CMOS technology. Intel Technol. J. 12, 2.
[87]
Lee, H., Chang, N., Ogras, U., and Marculescu, R. 2007. On-chip communication architecture exploration: A quantitative evaluation of point-to-point, bus, and network-on-chip approaches. ACM Trans. Des. Autom. Electron. Syst. 12, 3.
[88]
Lehtonen, T., Liljeberg, P., and Plosila, J. 2007a. Analysis of forward error correction methods for nanoscale networks-onchip. In Proceedings of the 2nd International Conference on Nano-Networks (Nano-Net'07). Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, 1--5.
[89]
Lehtonen, T., Liljeberg, P., and Plosila, J. 2007b. Online reconfigurable self-timed links for fault tolerant noc. VLSI Des. 2007, 13.
[90]
Lehtonen, T., Wolpert, D., Liljeberg, P., Plosila, J., and Ampadu, P. 2010. Self-adaptive system for addressing permanent errors in on-chip interconnects. IEEE Trans. VLSI Syst. 18, 4, 527--540.
[91]
Lin, S.-Y., Shen, W.-C., Hsu, C.-C., Chao, C.-H., and Wu, A.-Y. 2009. Fault-tolerant router with built-in self-test/self-diagnosis and fault-isolation circuits for 2d-mesh based chip multiprocessor systems. In Proceedings of the International Symposium on VLSI Design, Automation and Test (VLSI-DAT'09). 72--75.
[92]
Lysne, O., Pinkston, T., and Duato, J. 2005. Part ii: A methodology for developing deadlock-free dynamic network reconfiguration processes. IEEE Trans. Parallel Distrib. Syst. 16, 5, 428--443.
[93]
Majer, M., Bobda, C., Ahmadinia, A., and Teich, J. 2005. Packet routing in dynamically changing networks on chip. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 154b--154b.
[94]
Malkin, G. and Steenstrup, M. 1995. Distance-vector routing. In Routing in Communication Networks, M. Steenstrup, Ed., Prentice Hall, 83--98.
[95]
Marculescu, R., Ogras, U., Peh, L.-S., Jerger, N., and Hoskote, Y. 2009. Outstanding research problems in noc design: System, microarchitecture, and circuit perspectives. IEEE Trans. Comput. 28, 1, 3--21.
[96]
Mcpherson, J. 2006. Reliability challenges for 45nm and beyond. In Proceedings of the 43rd ACM/IEEE Design Automation Conference (DAC'06). 176--181.
[97]
Mediratta, S. D. and Draper, J. 2007. Performance evaluation of probe-send fault-tolerant network-on-chip router. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP'07). 69--75.
[98]
Mejia, A., Flich, J., Duato, J., Reinemo, S.-A., and Skeie, T. 2006. Segment-based routing: An efficient fault-tolerant routing algorithm for meshes and tori. In Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS'06).
[99]
Mejia, A., Palesi, M., Flich, J., Kumar, S., Lopez, P., Holsmark, R., and Duato, J. 2009. Region-based routing: A mechanism to support efficient routing algorithms in nocs. IEEE Trans. Syst. 17, 3, 356--369.
[100]
Mintarno, E., Skaf, J., Zheng, R., Velamala, J. B., Cao, Y., Boyd, S., Dutton, R. W., and Mitra, S. 2011. Selftuning for maximized lifetime energy-efficiency in the presence of circuit aging. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 30, 5, 760--773.
[101]
Miranda, E. and Sune, J. 2004. Electron transport through broken down ultra-thin sio2 layers in mos devices. Microelectron. Reliabil. 44, 1, 1--23.
[102]
Mitra, S., Zhang, M., Waqas, S., Seifert, N., Gill, B., and Kim, K. S. 2006. Combinational logic soft error correction. In Proceedings of the IEEE International Test Conference (ITC'06). 1--9.
[103]
Moy, J. 1995. Link-state routing. In Routing in Communication Networks, M. Ste, Ed., Prentice Hall, 135--157.
[104]
Murali, S., Atienza, D., Benini, L., and De Micheli, G. 2006. A multi-path routing strategy with guaranteed in-order packet delivery and fault-tolerance for networks on chip. In Proceedings of the 43rd ACM/IEEE Design Automation Conference (DAC'06). 845--848.
[105]
Murali, S., Theocharides, T., Vijaykrishnan, N., Irwin, M., Benini, L., and Demicheli, G. 2005. Analysis of error recovery schemes for networks on chips. IEEE Des. Test Comput. 22, 5, 434--442.
[106]
Nicolaidis, M. 1999. Time redundancy based soft-error tolerance to rescue nanometer technologies. In Proceedings of the 17th IEEE VLSI Test Symposium. 86--94.
[107]
Ogras, U., Hu, J., and Marculescu, R. 2005. Key research problems in noc design: A holistic perspective. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).
[108]
Owens, J., Dally, W., Ho, R., Jayasimha, D., Keckler, S., and Peh, L.-S. 2007. Research challenges for on-chip interconnection networks. IEEE Micro 27, 5, 96--108.
[109]
Palesi, M., Kumar, S., and Catania, V. 2010. Leveraging partially faulty links usage for enhancing yield and performance in networks-on-chip. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 29, 426--440.
[110]
Pande, P. P., Ganguly, A., Feero, B., Belzer, B., and Grecu, C. 2006. Design of low power and reliable networks on chip through joint crosstalk avoidance and forward error correction coding. In Proceedings of the 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'06). 466--476.
[111]
Parikh, R. and Bertacco, V. 2011. Formally enhanced runtime verification to ensure noc functional correctness. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'11). 410--419.
[112]
Park, D., Nicopoulos, C., Kim, J., Vijaykrishnan, N., and Das, C. R. 2006. Exploring fault-tolerant network-on-chip architectures. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'06). IEEE Computer Society, Los Alamitos, CA, 93--104.
[113]
Patooghy, A. and Miremadi, S. G. 2008. Ltr: A low-overhead and reliable routing algorithm for network on chips. In Proceedings of the International SoC Design Conference (ISOCC'08). Vol. 1.
[114]
Patooghy, A., Miremadi, S. G., and Shafaei, M. 2010. Crosstalk modeling to predict channel elay in network-on-chips. In Proceedings of the IEEE International Conference on Computer Design (ICCD'10). 396--401.
[115]
Pirretti, M., Link, G. M., Brooks, R. R., Vijaykrishnan, N., Kandemir, M. T., and Irwin, M. J. 2004. Fault tolerant algorithms for network-on-chip interconnect. In Proceedings of the International Symposium on VLSI (ISVLSI'04). IEEE Computer Society, Los Alamitos, CA, 46--51.
[116]
Puente, V., Gregorio, J. A., Vallejo, F., and Beivide, R. 2008. Immunet: Dependable routing for interconnection networks with arbitrary topology. IEEE Trans. Comput. 57, 12, 1676--1689.
[117]
Radetzki, M. 2011. Fault-tolerant differential q routing in arbitrary noc topologies. In Proceedings of the International Conference on Embedded and Ubiquitous Computing (EUC'11). 33--40.
[118]
Raik, J., Ubar, R., and Govind, V. 2007. Test configurations for diagnosing faulty links in noc switches. In Proceedings of the 12th IEEE European Test Symposium (ETS'07). 29--34.
[119]
Rantala, V., Lehtonen, T., Liljeberg, P., and Plosila, J. 2009. Multi network interface architectures for fault tolerant network-on-chip. In Proceedings of the International Symposium on Signals, Circuits and Systems. 1--4.
[120]
Ravindran, D. K. 2009. Structural fault-tolerance on the noc circuit level. Tech. rep., Institut fur Technische Informatik, Universitat Stuttgart. June.
[121]
Rodrigo, S., Flich, J., Duato, J., and Hummel, M. 2008. Efficient unicast and multicast support for cmps. In Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO'08). 364--375.
[122]
Rodrigo, S., Flich, J., Roca, A., Medardoni, S., Bertozzi, D., Camacho, J., Silla, F., and Duato, J. 2010. Addressing manufacturing challenges with cost-efficient fault tolerant routing. In Proceedings of the 4th ACM/IEEE International Networks-on-Chip Symposium (NOCS'10). 25--32.
[123]
Rossi, D., Angelini, P., and Metra, C. 2007. Configurable error control scheme for noc signal integrity. In Proceedings of the International On-Line Testing Symposium (IOLTS'07). 43--48.
[124]
Saha, S. 2010. Modeling process variability in scaled cmos technology. IEEE Des. Test Comput. 27, 2, 8--16.
[125]
Sanyo Semiconductors. 2011. Quality and reliability handbook ver 3. http://semicon.sanyo.com/en/reliability/.
[126]
Schroeder, M. D., Birrell, A. D., Burrows, M., Murray, H., Needham, R. M., Rodeheffer, T. L., Satterthwaite, E. H., and Thacker, C. P. 1991. Autonet: A high-speed, self-configuring local area network using point-to-point links. IEEE J. Selected Areas Comm. 9, 8, 1318--1335.
[127]
Schafer, M., Hollstein, T., Zimmer, H., and Glesner, M. 2005. Deadlock-free routing and component placement for irregular mesh-based networks-on-chip. In Proceedings of the International Conference on Computer Aided Design (ICCAD'05). 238--245.
[128]
Schonwald, T., Zimmermann, J., Bringmann, O., and Rosenstiel, W. 2007. Fully adaptive fault-tolerant routing algorithm for network-on-chip architectures. In Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD'07). 527--534.
[129]
Shamshiri, S., Ghofrani, A., and Cheng, K.-T. 2011. End-to-end error correction and online diagnosis for on-chip networks. In Proceedings of the International Test Conference.
[130]
Shivakumar, P., Kistler, M., Keckler, S. W., Burger, D., and Alvisi, L. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In Proceedings of the International Conference on Dependable Systems and Networks.
[131]
Shooman, M. L. 2002. Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design. John Wiley & Sons.
[132]
Song, W., Edwards, D., Nunez-Yanez, J., and Dasgupta, S. 2009. Adaptive stochastic routing in fault-tolerant on-chip networks. In Proceedings of the 3rd ACM/IEEE International Symposium on Networks-on-Chip (NoCS'09). 32--37.
[133]
Sridhara, S. and Shanbhag, N. 2005. Coding for system-on-chip networks: A unified framework. IEEE Trans. VLSI Syst. 13, 6, 655--667.
[134]
Strano, A., Bertozzi, D., Trivino, F., Sanchez, J. L., Alfaro, F. J., and Flich, J. 2012. Osr-lite: Fast and deadlock-free noc reconfiguration framework. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modelling and Simulation.
[135]
Takeda, E. and Yang, C. 1995. Hot-Carrier Effects in MOS Devices. Academic Press.
[136]
Tamhankar, R., Murali, S., and De Micheli, G. 2005. Performance driven reliable link design for networks on chips. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'05). Vol. 2. 749--754.
[137]
Vangal, S., Howard, J., Ruhl, G., Dighe, S., Wilson, H., Tschanz, J., Finan, D., Iyer, P., Singh, A., Jacob, T., Jain, S., Venkataraman, S., Hoskote, Y., and Borkar, N. 2007. An 80-tile 1.28tflops network-on-chip in 65nm cmos. In Digest of Technical Papers of the IEEE International Solid-State Circuits Conference (ISSCC'07). 98--589.
[138]
Viterbi, A. J. 1971. Convolutional codes and their performance in communication systems. IEEE Trans. Comm. Technol. 19, 5, 751--772.
[139]
Vitkovski, A., Jantsch, A., Lauter, R., Haukilahti, R., and Nilsson, E. 2008. Low-power and error protection coding for network-on-chip traffic. IET Comput. Digital Techn. 2, 6, 483--492.
[140]
Vitkovskiy, A., Soteriou, V., and Nicopoulos, C. 2010. A fine-grained link-level fault-tolerant mechanism for networks-onchip. In Proceedings of the IEEE International Computer Design Conference (ICCD'10). 447--454.
[141]
Walker, M. 2000. Modeling the wiring of deep submicron ics. IEEE Spectrum 37, 3, 65--71.
[142]
Wittmann, R., Puchner, H., Hinh, L., Ceric, H., Gehring, A., and Selberherr, S. 2005. Simulation of dynamic nbti degradation for a 90nm cmos technology. In Proceedings of the Nanotechnology Conference.
[143]
Wu, E., Lai, W., Nowak, E., Mckenna, J., Vayshenker, A., and Harmon, D. 2001. Interplay of voltage and temperature acceleration of oxide breakdown for ultra-thin oxides. Microelectron. Engin. 59, 25--31.
[144]
Wu, J. and Wang, D. 2002. Fault-tolerant and deadlock-free routing in 2-d meshes using rectilinear-monotone polygonal fault blocks. In Proceedings of the International Conference on Parallel Processing. 247--254.
[145]
Xinming, D. and Xuemei, S. 2010. Fault-tolerant routing in a prdt(2,1)-based noc. In Proceedings of the 2nd International Computer Engineering and Technology Conference (ICCET'10).
[146]
Yaghini, P. M., Eghbal, A., Pedram, H., and Zarandi, H. R. 2011. Investigation of transient fault effects in synchronous and asynchronous network on chip router. J. Syst. Archit. 57, 1, 61--68.
[147]
Yang, Y. 2010. Issues of esd protection in nano-scale cmso. Ph.D. thesis, George Mason University, Fairfax, Virginia, USA.
[148]
Yu, A. J. and Lemieux, G. G. 2005. Fpga defect tolerance: Impact of granularity. In Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT'05). 189--196.
[149]
Yu, Q. and Ampadu, P. 2008. Adaptive error control for noc switch-to-switch links in a variable noise environment. In Proceedings of the IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems (DFTVS'08). 352--360.
[150]
Yu, Q. and Ampadu, P. 2010. Transient and permanent error co-management method for reliable networks-on-chip. In Proceedings of the 4th ACM/IEEE International Networks-on-Chip Symposium (NOCS'10). 145--154.
[151]
Yu, Q. and Ampadu, P. 2011. A dual-layer method for transient and permanent error co-management in noc links. IEEE Trans. Circ. Syst. II: Express Briefs 58, 1, 36--40.
[152]
Yu, Q. and Ampadu, P. 2012. Dual-layer adaptive error control for network-on-chip links. IEEE Trans. VLSI. Syst. 20, 7, 1304--1317.
[153]
Yu, Q., Cano, J., Flich, J., and Ampadu, P. 2012. Transient and permanent error control for high-end multiprocessor systems-on- chip. In Proceedings of the 6th IEEE/ACM International Symposium on Networks on Chip (NoCS'12). 169--176.
[154]
Yu, Q., Zhang, B., Li, Y., and Ampadu, P. 2010. Error control integration scheme for reliable noc. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS'10). 3893--3896.
[155]
Yu, Q., Zhang, M., and Ampadu, P. 2011. Exploiting inherent information redundancy to manage transient errors in noc routing arbitration. In Proceedings of the IEEE Network on Chip Symposium (NoCS'11).
[156]
Zhang, B. and Orshansky, M. 2008. Modeling of nbti-induced pmos degradation under arbitrary dynamic temperature variation. In Proceedings of the 9th International Symposium on Quality Electronic Design (ISQED'08). 774--779.
[157]
Zhang, M. and Shanbhag, N. 2006. Soft-error-rate-analysis (sera) methodology. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 25, 10, 2140--2155.
[158]
Zhang, Y. and Jiang, J. 2008. Bibliographical review on reconfigurable fault-tolerant control systems. Ann. Rev. Control 32, 229--252.
[159]
Zhang, Y., Li, H., and Li, X. 2009. Selected crosstalk avoidance code for reliable network-on-chip. J. Comput. Sci. Technol. 24, 6, 1074--1085.
[160]
Zhang, Y., Parikh, D., Sankaranarayanan, K., Skadron, K., and Stan, M. 2003. Hotleakage: A temperature-aware model of subthreshold and gate leakage for architects. Tech. rep. CS-2003--05, University of Virgiania, Department of Computer Science. March.
[161]
Zhang, Z., Greiner, A., and Taktak, S. 2008. A reconfigurable routing algorithm for a fault-tolerant 2d-mesh network-on-chip. In Proceedings of the Design Automation Conference (DAC'08). 441--446.
[162]
Zimmer, H. and Jantsch, A. 2003. A fault model notation and error-control scheme for switch-to-switch buses in a network-onchip. In Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 188--193.

Cited By

View all
  • (2024)Routing in circulant graphs based on a virtual coordinate systemUchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki10.26907/2541-7746.2023.3.282-293165:3(282-293)Online publication date: 12-Jan-2024
  • (2024)Chip and Package-Scale Interconnects for General-Purpose, Domain-Specific, and Quantum Computing Systems—Overview, Challenges, and OpportunitiesIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2024.344582914:3(354-370)Online publication date: Sep-2024
  • (2023)RescueSNN: enabling reliable executions on spiking neural network accelerators under permanent faultsFrontiers in Neuroscience10.3389/fnins.2023.115944017Online publication date: 12-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 46, Issue 1
October 2013
551 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2522968
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2013
Accepted: 01 January 2013
Revised: 01 September 2012
Received: 01 February 2012
Published in CSUR Volume 46, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Network-on-Chip
  2. dependability
  3. diagnosis
  4. failure mechanisms
  5. fault models
  6. fault tolerance
  7. reconfiguration

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)73
  • Downloads (Last 6 weeks)15
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Routing in circulant graphs based on a virtual coordinate systemUchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki10.26907/2541-7746.2023.3.282-293165:3(282-293)Online publication date: 12-Jan-2024
  • (2024)Chip and Package-Scale Interconnects for General-Purpose, Domain-Specific, and Quantum Computing Systems—Overview, Challenges, and OpportunitiesIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2024.344582914:3(354-370)Online publication date: Sep-2024
  • (2023)RescueSNN: enabling reliable executions on spiking neural network accelerators under permanent faultsFrontiers in Neuroscience10.3389/fnins.2023.115944017Online publication date: 12-Apr-2023
  • (2023)Astromorphic self-repair of neuromorphic hardware systemsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i6.25947(7821-7829)Online publication date: 7-Feb-2023
  • (2023)Adaptive Time-Triggered Network-on-Chip Architecture: Enhancing Safety2023 3rd International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON)10.1109/SMARTGENCON60755.2023.10442582(1-10)Online publication date: 29-Dec-2023
  • (2023)Systematic Construction of Deadlock-Free Routing for NoC Using Integer Linear Programming2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC60832.2023.00056(332-339)Online publication date: 18-Dec-2023
  • (2023)Design of a Fault-Tolerant Pseudo-3D Routing2023 IEEE International Test Conference India (ITC India)10.1109/ITCIndia59034.2023.10235563(1-6)Online publication date: 23-Jul-2023
  • (2023)Dynamic routing algorithm to normalize the routers utilization in mesh based NoC2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)10.1109/ESDC56251.2023.10149856(1-6)Online publication date: 4-May-2023
  • (2023)On Conditional Edge-Fault-Tolerant Strong Menger Edge Connectivity Of Folded HypercubesThe Computer Journal10.1093/comjnl/bxad018Online publication date: 10-Mar-2023
  • (2023)An improved reconfiguration algorithm for handling 1-point NoC failuresMicroprocessors and Microsystems10.1016/j.micpro.2023.104910101(104910)Online publication date: Sep-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media