This paper proposes a recursive modular architecture for implementing a large-scale Multicast Output Buffered ATM Switch (MOBAS). Many proposed multicast switch architectures have a size limitation problem because their switches use either (1) a centralized processing unit for cell replication and routing, (2) a shared medium for cell transmission and storage, or (3) an irregular interconnection network for switching. However, in our proposed architecture, the four major functions of designing a multicast switch; cell replication, cell routing, cell contention resolution, and cell addressing, are all performed distributedly so that a large switch size is achievable. Multicast Knockout Principle, an extension of Generalized Knockout Principle, is applied in constructing the entire switch fabric in order to reduce the hardware complexity (e.g., the number of switch elements and interconnection wires) by almost one order of magnitude. The proposed MOBAS has a regular and uniform structure and, thus, has the advantages of: (1) easy expansion due to the modular structure, (2) high integration density for VLSL implementation, (3) relaxed synchronization for data and clock signals, and (4) building the center switch fabric with a single type of chip. A two-stage structure of the multicase output buffered ATM switch (MOBAS) is described. The performance of the switch fabric in the cell loss probability is analyzed, and some numerical results are shown. A 16 × 16 ATM crosspoint switch chip based on the proposed architecture has been implemented using CMOS 2-μm technology and tested to operate correctly.