TY - GEN
T1 - Multi-level pipelined parallel hardware architecture for high throughput motion and disparity estimation in multiview video coding
AU - Zatt, Bruno
AU - Shafique, Muhammad
AU - Bampi, Sergio
AU - Henkel, Jorg
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - This paper presents a novel motion and disparity estimation (ME, DE) scheme in Multiview Video Coding (MVC) that addresses the high throughput challenge jointly at the algorithm and hardware levels. Our scheme is composed of a fast ME/DE algorithm and a multi-level pipelined parallel hardware architecture. The proposed fast ME/DE algorithm exploits the correlation available in the 3D-neighborhood (spatial, temporal, and view). It eliminates the search step for different frames by prioritizing and evaluating the neighborhood predictors. It thereby reduces the coding computations by up to 83% with 0.1 dB quality loss. The proposed hardware architecture further improves the throughput by using parallel ME/DE modules with a shared array of SAD (Sum of Absolute Differences) accelerators and by exploiting the four levels of parallelism inherent to the MVC prediction structure (view, frame, reference frame, and macroblock levels). A multi-level pipeline schedule is introduced to reduce the pipeline stalls. The proposed architecture is implemented for a Xilinx Virtex-6 FPGA and as an ASIC with an IBM 65nm low power technology. It is compared to state-of-the-art at both algorithm and hardware levels. Our scheme achieves a real-time (30fps) ME/DE in 4-view High Definition (HD1080p) encoding with a low power consumption of 81 mW.
AB - This paper presents a novel motion and disparity estimation (ME, DE) scheme in Multiview Video Coding (MVC) that addresses the high throughput challenge jointly at the algorithm and hardware levels. Our scheme is composed of a fast ME/DE algorithm and a multi-level pipelined parallel hardware architecture. The proposed fast ME/DE algorithm exploits the correlation available in the 3D-neighborhood (spatial, temporal, and view). It eliminates the search step for different frames by prioritizing and evaluating the neighborhood predictors. It thereby reduces the coding computations by up to 83% with 0.1 dB quality loss. The proposed hardware architecture further improves the throughput by using parallel ME/DE modules with a shared array of SAD (Sum of Absolute Differences) accelerators and by exploiting the four levels of parallelism inherent to the MVC prediction structure (view, frame, reference frame, and macroblock levels). A multi-level pipeline schedule is introduced to reduce the pipeline stalls. The proposed architecture is implemented for a Xilinx Virtex-6 FPGA and as an ASIC with an IBM 65nm low power technology. It is compared to state-of-the-art at both algorithm and hardware levels. Our scheme achieves a real-time (30fps) ME/DE in 4-view High Definition (HD1080p) encoding with a low power consumption of 81 mW.
UR - http://www.scopus.com/inward/record.url?scp=79957538743&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79957538743&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:79957538743
SN - 9783981080179
T3 - Proceedings -Design, Automation and Test in Europe, DATE
SP - 1448
EP - 1453
BT - Proceedings - Design, Automation and Test in Europe Conference and Exhibition, DATE 2011
T2 - 14th Design, Automation and Test in Europe Conference and Exhibition, DATE 2011
Y2 - 14 March 2011 through 18 March 2011
ER -