Recently, a novel switch architecture, the load balanced (LB) switch proposed by C.S. Chang et al ,  opened a new avenue for designing a large-capacity packet switch. The load balanced switch consists of two stages. First, a load-balancing stage spreads arriving packets equally among all linecards. Then, a forwarding stage transfers packets from the linecards to their final output destination. The load balanced switch does not need any centralized scheduler and can achieve 100% throughput under a broad class of traffic distributions. However, the load balanced switch may cause packets at the output port to be out of sequence. Several schemes have been proposed to tackle the out-of-sequence problem of the load balanced switch. However, they are either too complex to implement, or introduce a large additional delay. In this paper, we present a practical load balanced switch, called the Byte-Focal switch, which uses packet-by-packet scheduling to significantly improve the delay performance over switches of comparable complexity.