Deep packet inspection (DPI) is often used in network intrusion detection and prevention systems (NIDPS), where incoming packet payloads are compared against known attack signatures. Processing every single byte in the incoming packet payload has a very stringent time constraint, e.g., 200 ps for a 40-Gbps line. Traditional DPI systems either need a large memory space or use special memory such as ternary content addressable memory (TCAM), limiting parallelism, or yielding high cost/power consumption. In this paper, we present a high-speed, single-chip DPI scheme that is scalable and configurable through memory updates. The scheme is based on a novel data structure called TriBiCa (Trie Bitmap Content Analyzer), which provides minimal perfect hashing functionality. It uses a trie structure with a hash function performed at each layer. Branching is determined by the hashing results with an objective to evenly partition attack signatures into multiple groups at each layer. During a query, as an input traverses the trie, an address to a table in the memory that stores all attack signatures is formed and is used to access the signature for an exact match. Due to the small space required, multiple copies of TriBiCa can be implemented on a single chip to perform pipelining and parallelism simultaneously, thus achieving high throughput. We have designed the TriBiCa on a modest FPGA chip, Xilinx Virtex II Pro, achieving 10-Gbps throughput without using any external memory. A proof-of-concept design is implemented and tested with 1-Gbps packet streams. By using today's state-of-the-art FPGAs, a throughput of 40 Gbps is believed to be achievable.