TY - JOUR
T1 - Extending the effective throughput of NoCs with distributed shared-buffer routers
AU - Ramanujam, Rohit Sunkam
AU - Soteriou, Vassos
AU - Lin, Bill
AU - Peh, Li Shiuan
N1 - Funding Information:
Manuscript received July 2, 2010; accepted November 17, 2010. Date of current version March 18, 2011. This paper was recommended by Associate Editor L. Benini. This work was supported in part by the National Science Foundation, under Grant CCF 0702341.
PY - 2011/4
Y1 - 2011/4
N2 - Router microarchitecture plays a central role in the performance of networks-on-chip (NoCs). Buffers are needed in routers to house incoming flits that cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, a new router design based on a distributed shared-buffer (DSB) architecture is proposed that aims to practically emulate an OBR. The proposed architecture introduces innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow control. Practical DSB configurations are also presented with reduced power overheads while exhibiting negligible performance degradation. Compared to a state-of-the-art pipelined IBR, the proposed DSB router achieves up to 19% higher throughput on synthetic traffic and reduces packet latency on average by 61% when running SPLASH-2 benchmarks with high contention. On average, the saturation throughput of DSB routers is within 7% of the theoretically ideal saturation throughput under the synthetic workloads evaluated.
AB - Router microarchitecture plays a central role in the performance of networks-on-chip (NoCs). Buffers are needed in routers to house incoming flits that cannot be immediately forwarded due to contention. This buffering can be done at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they can sustain higher throughputs and have lower queuing delays under high loads than IBRs. However, a direct implementation of an OBR requires a router speedup equal to the number of ports, making such a design prohibitive under aggressive clocking needs and limited power budgets of most NoC applications. In this paper, a new router design based on a distributed shared-buffer (DSB) architecture is proposed that aims to practically emulate an OBR. The proposed architecture introduces innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow control. Practical DSB configurations are also presented with reduced power overheads while exhibiting negligible performance degradation. Compared to a state-of-the-art pipelined IBR, the proposed DSB router achieves up to 19% higher throughput on synthetic traffic and reduces packet latency on average by 61% when running SPLASH-2 benchmarks with high contention. On average, the saturation throughput of DSB routers is within 7% of the theoretically ideal saturation throughput under the synthetic workloads evaluated.
KW - Network throughput
KW - networks-on-chip
KW - on-chip interconnection networks
KW - router microarchitecture
UR - http://www.scopus.com/inward/record.url?scp=79953092220&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953092220&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2011.2110550
DO - 10.1109/TCAD.2011.2110550
M3 - Article
AN - SCOPUS:79953092220
SN - 0278-0070
VL - 30
SP - 548
EP - 561
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 4
M1 - 5737868
ER -