The promise of a buffered crossbar switch - a crossbar switch with a packet buffer at each crosspoint - is that it can provide good delay performance with much less complex, practical scheduling algorithms. With today's technology, it is now possible to implement it in a single chip. Thus it has attracted great attention recently. Though simple distributed algorithms can achieve 100% throughput under uniform traffic, so far there are no distributed algorithms which can achieve 100% throughput under general admissible arrival patterns. In this paper, we propose a distributed scheduling algorithm which achieves 100% throughput for any admissible Bernoulli arrival traffic. To the best of our knowledge, this is the first distributed algorithm which can achieve this. The algorithm is called DISQUO: DIStributed QUeue input-Output scheduler. Our simulation results also show that DISQUO can provide good delay performance for different traffic patterns.