TY - GEN
T1 - ADAPT
T2 - 2015 20th Asia and South Pacific Design Automation Conference, ASP-DAC 2015
AU - Zhang, Xi
AU - Javaid, Haris
AU - Shafique, Muhammad
AU - Ambrose, Jude Angelo
AU - Henkel, Jorg
AU - Parameswaran, Sri
N1 - Publisher Copyright:
© 2015 IEEE.
Copyright:
Copyright 2015 Elsevier B.V., All rights reserved.
PY - 2015/3/11
Y1 - 2015/3/11
N2 - Future on-chip manycore systems are expected to have hundreds of cores, and to be used for a number of applications to amortize their fabrication costs. In this paper, we examine how software pipelines, which are useful for streaming/multimedia applications, can be efficiently executed on a manycore system with shared memory. The goal is to balance the stages of the pipeline under workload and resource variations. This paper presents ADAPT, a method to quickly detect bottleneck stages and add cores (workers) to those bottleneck stages at run-time. Further, if there are no idle workers, then a shuffling of workers across stages is performed to improve/maintain throughput. ADAPT is implemented in a 48-core system which is built using a commercial core and tool suite. For a variety of applications, ADAPT takes less than 2 μs for one run-time adaptation, and achieves up to 2.1× the throughput of a state-of-the-art method (which is modified and implemented in the same system for a fair comparison). These results illustrate the applicability of ADAPT for fine-grained run-time management of manycore systems to achieve high throughput for software pipelines.
AB - Future on-chip manycore systems are expected to have hundreds of cores, and to be used for a number of applications to amortize their fabrication costs. In this paper, we examine how software pipelines, which are useful for streaming/multimedia applications, can be efficiently executed on a manycore system with shared memory. The goal is to balance the stages of the pipeline under workload and resource variations. This paper presents ADAPT, a method to quickly detect bottleneck stages and add cores (workers) to those bottleneck stages at run-time. Further, if there are no idle workers, then a shuffling of workers across stages is performed to improve/maintain throughput. ADAPT is implemented in a 48-core system which is built using a commercial core and tool suite. For a variety of applications, ADAPT takes less than 2 μs for one run-time adaptation, and achieves up to 2.1× the throughput of a state-of-the-art method (which is modified and implemented in the same system for a fair comparison). These results illustrate the applicability of ADAPT for fine-grained run-time management of manycore systems to achieve high throughput for software pipelines.
UR - http://www.scopus.com/inward/record.url?scp=84926477400&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84926477400&partnerID=8YFLogxK
U2 - 10.1109/ASPDAC.2015.7059092
DO - 10.1109/ASPDAC.2015.7059092
M3 - Conference contribution
AN - SCOPUS:84926477400
T3 - 20th Asia and South Pacific Design Automation Conference, ASP-DAC 2015
SP - 701
EP - 706
BT - 20th Asia and South Pacific Design Automation Conference, ASP-DAC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 January 2015 through 22 January 2015
ER -