The load balanced (LB) switch proposed by C.S. Chang et al ,  consists of two stages. First, a load-balancing stage spreads arriving packets equally among all linecards. Then, a forwarding stage transfers packets from the linecards to their final output destination. The load balanced switch does not need any centralized scheduler and can achieve 100% throughput under a broad class of traffic distributions. In this paper, we analyze a practical load balanced switch, called the Byte-Focal switch , which uses packet-by-packet scheduling to significantly improve the delay performance over switches of comparable complexity. We analyze the average delay for different stages in the Byte-Focal switch. We show that the average queueing delay is roughly linear with the switch size N and although the worst case resequencing delay is N 2, the average resequencing delay is much smaller. This means that we can reduce the required resequencing buffer size significantly.