The single-chip crosspoint-queued (CQ) switch is a self-sufficient switching architecture enabled by state-of-art ASIC technology. Unlike the legacy input-queued or output-queued switches, this kind of switch has all its buffers placed at the crosspoints of input and output lines. Scheduling is also performed inside the switching core, and does not rely on instantaneous communications with input or output line-cards. Compared with other legacy switching architectures, the CQ switch has the advantages of high throughput, minimal delay, low scheduling complexity, and no speedup requirement. However, since the crosspoint buffers are small and segregated, packets may be dropped as soon as one of them becomes full. Thus how to efficiently use the crosspoint buffers and decrease the packet drop rate remains a major problem that needs to be addressed. In this paper, we propose a novel chained structure for the CQ switch, which supports load balancing and deflection routing. We also design scheduling algorithms to maintain the correct packet order caused by multi-path switching. All these techniques require modest hardware modifications and memory speedup in the switching core, but can greatly boost the overall buffer utilization and reduce the packet drop rate, especially for large switches with small crosspoint buffers under bursty and non-uniform traffic.