Downscaled complementary metal-oxide semiconductor (CMOS) technology feature sizes have enabled massive transistor integration densities. Multi-core chips with billions of transistors are now a reality. However, this rapid increase in on-chip resources has come at the expense of higher susceptibility to defects and wear-out. The inter-router communication links of networks-on-chips (NoCs) are composed of metal wires that are especially vulnerable to catastrophic physical effects such as those of electro-migration, which can even cause link disconnects. To address this hazard, fault-tolerant (FT) routing algorithms sustain on-chip communication by re-routing messages around faulty links, or regions. This work presents a new FT routing scheme that employs a localised re-routing approach. Packets are de-toured around faulty links/regions based on purely local and distributed decisions, and without any global link state knowledge. The algorithm, which is proven to be deadlock-and livelock-free, also handles dynamically occurring faults. Detailed evaluation with synthetic traffic patterns and real applications within a full-system simulation environment demonstrate the efficacy of the new scheme with up to 12% of NoC links being faulty. Synthesis results also prove the feasibility of the proposed protocol at modest hardware and power consumption overheads of only over 5 and 2.5%, respectively.
ASJC Scopus subject areas
- Hardware and Architecture
- Electrical and Electronic Engineering