Packet switches have been extensively studied during the last two decades. Most commercially available packet switches have a single path between input and output ports, which limits the scalability of the switch. In order to accommodate the exponentially increasing demand of Internet traffic, we propose an ultra scalable multi-path switch architecture, called TrueWay, a multi-plane multi-stage buffered switch. Packets are delivered between the stages of switch modules with link-to-link flow control to avoid overflowing the next-stage's buffers. Schemes such as back-pressure, credit-based and our proposed DQ scheme are discussed. One of the challenging issues of multi-path buffered switch is the maintenance of packet orders that can be resolved by appropriate port-to-port flow control. Schemes such as static hashing, time-stamp-based re-sequencing, dynamic hashing, and window-based re-sequencing, are considered. We show by simulation that the TrueWay switch with a speed up of 1.6 is able to perform nearly as well as the output buffered switch under most interested traffic distributions. A small-scale prototyped switch fabric has been built on a 16-card chassis with high-speed SerDes interconnections at the backplane (with 640 Gbps capacity), and with FPGA chips on each card to reconfigure the switch to test various stage-to-stage and port-to-port flow control schemes. With today's ASIC technology, e.g., 64×64 switch chip with SerDes Interfaces and VCSEL (Vertical Cavity Surface Emitting Laser) optical interconnections, the TrueWay switch can scale up to 40Tbps.