The emergence of new network applications like network intrusion detection system, packet-level accounting, and load-balancing requires packet classification to report all matched rules, instead of only the best matched rule. Although several schemes have been proposed recently to address the multi-match packet classification problem, most of them require either huge memory or expensive Ternary Content Addressable Memory (TCAM) to store the intermediate data structure, or suffer from steep performance degradation under certain types of classifiers. In this paper, we decompose the operation of multi-match packet classification from the complicated multi-dimensional search to several single-dimensional searches, and present an asynchronous pipeline architecture based on a signature tree structure to combine the intermediate results returned from single-dimensional searches. By spreading edges of the signature tree in multiple hash tables at different stages of the pipeline, the pipeline can achieve a high throughput via the inter-stage parallel access to hash tables. To exploit further intra-stage parallelism, two edge-grouping algorithms are designed to evenly divide the edges associated with each stage into multiple work-conserving hash tables with minimum overhead. Extensive simulation using realistic classifiers and traffic traces shows that the proposed pipeline architecture outperforms HyperCut and B2PC schemes in classification speed by at least one order of magnitude, while with a similar storage requirement. Particularly, with different types of classifiers of 4K rules, the proposed pipeline architecture is able to achieve a throughput between 19.5 Gbps and 91 Gbps.