TY - GEN
T1 - BMW Tree
T2 - ACM SIGCOMM 2023 Conference
AU - Yao, Ruyi
AU - Zhang, Zhiyu
AU - Fang, Gaojian
AU - Gao, Peixuan
AU - Liu, Sen
AU - Fan, Yibo
AU - Xu, Yang
AU - Jonathan Chao, H.
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/9/10
Y1 - 2023/9/10
N2 - Push-In-First-Out (PIFO) queue has been extensively studied as a programmable scheduler. To achieve accurate, large-scale, and high-throughput PIFO implementation, we propose the Balanced Multi-way (BMW) Sorting Tree for real-time packet sorting. The tree is highly modularized, insertion-balanced and pipeline-friendly with autonomous nodes.Based on it, we design two simple and efficient hardware designs. The first one is a register-based (R-BMW) scheme. With a pipeline, it features an impressively high and stable throughput without any frequency reduction theoretically even under more levels. We then propose Ranking Processing Units to drive the BMW-Tree (RPU-BMW) to improve the scalability, where nodes are stored in SRAMs and dynamically loaded into/off from RPUs. As the capacity of BMW-Tree grows exponentially, only a few RPUs are needed for a large scale.The evaluation shows that when deployed on the Xilinx Alveo U200 card, R-BMW improves the throughput by 4.8x compared to the original PIFO implementation, while exhibiting a similar capacity. RPU-BMW is synthesized in GlobalFoundries 28nm process, costing a modest 0.522% (1.043mm2) chip area and 0.57MB off-chip memory to support 87k flows at 200Mpps. To our best knowledge, RPU-BMW is the first accurate PIFO implementation supporting over 80k flows at as fast as 200Mpps.
AB - Push-In-First-Out (PIFO) queue has been extensively studied as a programmable scheduler. To achieve accurate, large-scale, and high-throughput PIFO implementation, we propose the Balanced Multi-way (BMW) Sorting Tree for real-time packet sorting. The tree is highly modularized, insertion-balanced and pipeline-friendly with autonomous nodes.Based on it, we design two simple and efficient hardware designs. The first one is a register-based (R-BMW) scheme. With a pipeline, it features an impressively high and stable throughput without any frequency reduction theoretically even under more levels. We then propose Ranking Processing Units to drive the BMW-Tree (RPU-BMW) to improve the scalability, where nodes are stored in SRAMs and dynamically loaded into/off from RPUs. As the capacity of BMW-Tree grows exponentially, only a few RPUs are needed for a large scale.The evaluation shows that when deployed on the Xilinx Alveo U200 card, R-BMW improves the throughput by 4.8x compared to the original PIFO implementation, while exhibiting a similar capacity. RPU-BMW is synthesized in GlobalFoundries 28nm process, costing a modest 0.522% (1.043mm2) chip area and 0.57MB off-chip memory to support 87k flows at 200Mpps. To our best knowledge, RPU-BMW is the first accurate PIFO implementation supporting over 80k flows at as fast as 200Mpps.
KW - networking hardware
KW - programmable packet scheduler
UR - http://www.scopus.com/inward/record.url?scp=85174072214&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174072214&partnerID=8YFLogxK
U2 - 10.1145/3603269.3604862
DO - 10.1145/3603269.3604862
M3 - Conference contribution
AN - SCOPUS:85174072214
T3 - SIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference
SP - 208
EP - 219
BT - SIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference
PB - Association for Computing Machinery, Inc
Y2 - 10 September 2023 through 14 September 2023
ER -