This paper presents a high-throughput hardware architecture for H.264/AVC CAVLC encoding. Our scheme eliminates the pipeline stage of computing the coefficient statistics (as adopted by state-of-the-art hardware architectures) with a pre-processing stage during the quantization in order to avoid the extra looping logic in CAVLC. This provides significant performance improvement compared to state-of-the-art (saving of 16 cycles per 4x4 sub-block compared to ). Furthermore, our hardware architecture employs parallel processing of Trailing Ones (which is one of the inherently sequential steps in CAVLC) and encodes levels and runs in parallel in the same pipeline stage. An intelligent bitstream writing logic generates the compliant bitstream. Compared to state-of-the-art, our proposed hardware architecture requires 72% reduced area and achieves 2x higher throughput, while processing HD1080p@30fps.