A simple distributed modular architecture for a very large scale ATM (asynchronous transfer mode) switch is proposed. By extending the concept of the original knockout switch, the cell-filtering and contention-resolution functions are distributed over many small switch elements, which are arranged in a cross-bar structure. The output ports of a switch fabric are partitioned into a number of groups by a novel grouping network to permit sharing of the routing paths in the same group. This results in close to one order of magnitude fewer switch elements. The proposed ATM switch has a regular and uniform structure and thus has the following advantages: (1) easy expansion due to the modular structure, (2) high integration expansion density for the VLSI implementation, (3) relaxed synchronization for data and clock signals, and (4) building the center switch fabric with a single type of chip. Peripheral line concentrators, or statistical multiplexers, can be implemented with the grouping networks and tightly coupled with the switch fabric. This permits the elimination of buffers at the cost of an increase in the number of switch elements in the switch fabric. An experimental prototype circuit design for the key switch element has been completed, and it is shown that more than 4000 of them can be integrated in a VLSI chip with existing CMOS ≤1 μm technology.