TY - GEN
T1 - Design of a high-throughput distributed shared-buffer NoC router
AU - Ramanujam, Rohit Sunkam
AU - Soteriou, Vassos
AU - Lin, Bill
AU - Peh, Li Shiuan
PY - 2010
Y1 - 2010
N2 - Router microarchitecture plays a central role in the performance of an on-chip network (NoC). Buffers are needed in routers to house incoming flits which cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, we propose a new router design that aims to emulate an OBR practically, based on a distributed shared-buffer (DSB) router architecture. We introduce innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow-control. We also present practical DSB configurations that can reduce the power overhead with negligible degradation in performance. The proposed DSB router achieves upto 19% higher throughput on synthetic traffic and reduces packet latency by 60% on average for SPLASH-2 benchmarks with high contention, compared to a state-of-art pipelined IBR. On average, the saturation throughput of DSB routers is within 10% of the theoretically ideal saturation throughput under the synthetic workloads evaluated.
AB - Router microarchitecture plays a central role in the performance of an on-chip network (NoC). Buffers are needed in routers to house incoming flits which cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, we propose a new router design that aims to emulate an OBR practically, based on a distributed shared-buffer (DSB) router architecture. We introduce innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow-control. We also present practical DSB configurations that can reduce the power overhead with negligible degradation in performance. The proposed DSB router achieves upto 19% higher throughput on synthetic traffic and reduces packet latency by 60% on average for SPLASH-2 benchmarks with high contention, compared to a state-of-art pipelined IBR. On average, the saturation throughput of DSB routers is within 10% of the theoretically ideal saturation throughput under the synthetic workloads evaluated.
KW - On-chip interconnection networks
KW - Router microarchitecture
UR - http://www.scopus.com/inward/record.url?scp=77955111275&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955111275&partnerID=8YFLogxK
U2 - 10.1109/NOCS.2010.17
DO - 10.1109/NOCS.2010.17
M3 - Conference contribution
AN - SCOPUS:77955111275
SN - 9780769540535
T3 - NOCS 2010 - The 4th ACM/IEEE International Symposium on Networks-on-Chip
SP - 69
EP - 78
BT - NOCS 2010 - The 4th ACM/IEEE International Symposium on Networks-on-Chip
T2 - 4th ACM/IEEE International Symposium on Networks on Chip, NOCS 2010
Y2 - 3 May 2010 through 6 May 2010
ER -