TY - GEN
T1 - A parallel approach for high performance hardware design of intra prediction in H.264/AVC video codec
AU - Shafique, Muhammad
AU - Bauer, Lars
AU - Henkel, Jörg
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2009
Y1 - 2009
N2 - The H.264/AVC Intra Frame Codec (i.e. all frames are coded as I-frames) targets high-resolution/high-end encoding applications (e.g. digital cinema and high quality archiving etc.), providing much better compression efficiency at lower computational complexity compared to MJPEG2000. Moreover, in case of video coding of very high motion scenes, the number of Intra Macroblocks is dominant. Intra Prediction is a compute intensive and memory-critical part that consumes 80% of the computation time of the entire Intra Compression process when executing the H.264 encoder on MIPS processor [13]. We therefore present a novel hardware for H.264 Intra Prediction that processes all the prediction modes in parallel inside one integrated module (i.e. mode-level parallelism) enabling us to exploit the full space of optimization. It exhibits a group-based write-back scheme to reduce the memory transfers in order to facilitate the fast mode-decision schemes. Our Luma 4x4 hardware is 3.6x, 5.2x, and 5.5x faster than state-of-the-art approaches [13], QS0 [14], and [15], respectively. Our results show that processing Luma 16x16, Chroma 8x8, and Luma 4x4 with the proposed approach is 7.2x, 6.5x, and 1.8x faster (while giving an energy saving of 60%, 80%, and 74%) when compared with Dedicated Module Approach [13] (each prediction mode is processed with its independent hardware module i.e. a typical ASIC style for Intra Prediction). We get an area saving of 58% for Luma 4x4 hardware.
AB - The H.264/AVC Intra Frame Codec (i.e. all frames are coded as I-frames) targets high-resolution/high-end encoding applications (e.g. digital cinema and high quality archiving etc.), providing much better compression efficiency at lower computational complexity compared to MJPEG2000. Moreover, in case of video coding of very high motion scenes, the number of Intra Macroblocks is dominant. Intra Prediction is a compute intensive and memory-critical part that consumes 80% of the computation time of the entire Intra Compression process when executing the H.264 encoder on MIPS processor [13]. We therefore present a novel hardware for H.264 Intra Prediction that processes all the prediction modes in parallel inside one integrated module (i.e. mode-level parallelism) enabling us to exploit the full space of optimization. It exhibits a group-based write-back scheme to reduce the memory transfers in order to facilitate the fast mode-decision schemes. Our Luma 4x4 hardware is 3.6x, 5.2x, and 5.5x faster than state-of-the-art approaches [13], QS0 [14], and [15], respectively. Our results show that processing Luma 16x16, Chroma 8x8, and Luma 4x4 with the proposed approach is 7.2x, 6.5x, and 1.8x faster (while giving an energy saving of 60%, 80%, and 74%) when compared with Dedicated Module Approach [13] (each prediction mode is processed with its independent hardware module i.e. a typical ASIC style for Intra Prediction). We get an area saving of 58% for Luma 4x4 hardware.
UR - http://www.scopus.com/inward/record.url?scp=70350075799&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70350075799&partnerID=8YFLogxK
U2 - 10.1109/date.2009.5090889
DO - 10.1109/date.2009.5090889
M3 - Conference contribution
AN - SCOPUS:70350075799
SN - 9783981080155
T3 - Proceedings -Design, Automation and Test in Europe, DATE
SP - 1434
EP - 1439
BT - Proceedings - 2009 Design, Automation and Test in Europe Conference and Exhibition, DATE '09
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2009 Design, Automation and Test in Europe Conference and Exhibition, DATE '09
Y2 - 20 April 2009 through 24 April 2009
ER -