TY - GEN

T1 - Revisiting the cache miss analysis of multithreaded algorithms

AU - Cole, Richard

AU - Ramachandran, Vijaya

N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.

PY - 2012

Y1 - 2012

N2 - This paper revisits the cache miss analysis of algorithms when scheduled using randomized work stealing (RWS) in a parallel environment where processors have private caches. We focus on the effect of task migration on cache miss costs, and in particular, the costs of accessing "hidden" data typically stored on execution stacks (such as the return location for a recursive call). Prior analyses, with the exception of [1], do not account for such costs, and it is not clear how to extend them to account for these costs. By means of a new analysis, we show that for a variety of basic algorithms these task migration costs are no larger than the costs for the remainder of the computation, and thereby recover existing bounds. We also analyze a number of algorithms implicitly analyzed by [1], namely Scans (including Prefix Sums and Matrix Transposition), Matrix Multiply (the depth n in-place algorithm, the standard 8-way divide and conquer algorithm, and Strassen's algorithm), I-GEP, finding a longest common subsequence, FFT, the SPMS sorting algorithm, list ranking and graph connected components; we obtain sharper bounds in many cases. While this paper focusses on the RWS scheduler, the bounds we obtain are a function of the number of steals, and thus would apply to any scheduler given bounds on the number of steals it induces.

AB - This paper revisits the cache miss analysis of algorithms when scheduled using randomized work stealing (RWS) in a parallel environment where processors have private caches. We focus on the effect of task migration on cache miss costs, and in particular, the costs of accessing "hidden" data typically stored on execution stacks (such as the return location for a recursive call). Prior analyses, with the exception of [1], do not account for such costs, and it is not clear how to extend them to account for these costs. By means of a new analysis, we show that for a variety of basic algorithms these task migration costs are no larger than the costs for the remainder of the computation, and thereby recover existing bounds. We also analyze a number of algorithms implicitly analyzed by [1], namely Scans (including Prefix Sums and Matrix Transposition), Matrix Multiply (the depth n in-place algorithm, the standard 8-way divide and conquer algorithm, and Strassen's algorithm), I-GEP, finding a longest common subsequence, FFT, the SPMS sorting algorithm, list ranking and graph connected components; we obtain sharper bounds in many cases. While this paper focusses on the RWS scheduler, the bounds we obtain are a function of the number of steals, and thus would apply to any scheduler given bounds on the number of steals it induces.

UR - http://www.scopus.com/inward/record.url?scp=84860813040&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860813040&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-29344-3_15

DO - 10.1007/978-3-642-29344-3_15

M3 - Conference contribution

AN - SCOPUS:84860813040

SN - 9783642293436

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 172

EP - 183

BT - LATIN 2012

T2 - 10th Latin American Symposiumon Theoretical Informatics, LATIN 2012

Y2 - 16 April 2012 through 20 April 2012

ER -