A large-capacity, multi-plane, multi-stage buffered packet switch, called the TrueWay switch , was previously proposed by us. It can achieve hundreds of terabits/sec capacity. A small-scale TrueWay switch was prototyped to demonstrate the concept and feasibility. Three different load balancing schemes were investigated to achieve high throughput and low average delay with a moderate speedup. In this paper, we focus on the study of one of the load balancing schemes, window-based re-sequencing scheme, without a speedup. It is the most promising one among the three in terms of performance. Buffered switch modules are used in different stages to eliminate the need of centralized scheduling. However, packet out-of-subsequence is inventible as packets are distributed to different paths that have various queuing delays. By applying flow control between the input and output ports and limiting the re-sequencing window size (similar to TCP/IP flow control), we are able to keep the implementation cost to an acceptable level while still providing high throughput. Link-level flow control between the switch stages is required to prevent the downstream queues from being overflowed. The interaction between link flow control at switch stages and end-to-end flow control at switch ports is an interesting problem. We show by simulations that the TrueWay switch can be engineered to achieve high throughput without an internal speedup even under bursty non-uniform traffic distributions.