TY - GEN
T1 - Resource oblivious sorting on multicores
AU - Cole, Richard
AU - Ramachandran, Vijaya
PY - 2010
Y1 - 2010
N2 - We present a new deterministic sorting algorithm that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(n logn) time cache-obliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(logn loglogn), which improves on previous bounds for deterministic sample sort. Given a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, our algorithm can be scheduled effectively on these p cores in a cache-oblivious manner. We improve on the above cache-oblivious processor-aware parallel implementation by using the Priority Work Stealing Scheduler (PWS) that we presented recently in a companion paper [12]. The PWS scheduler is both processor- and cache-oblivious (i.e., resource oblivious), and it tolerates asynchrony among the cores. Using PWS, we obtain a resource oblivious scheduling of our sorting algorithm that matches the performance of the processor-aware version. Our analysis includes the delay incurred by false-sharing. We also establish good bounds for our algorithm with the randomized work stealing scheduler.
AB - We present a new deterministic sorting algorithm that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(n logn) time cache-obliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(logn loglogn), which improves on previous bounds for deterministic sample sort. Given a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, our algorithm can be scheduled effectively on these p cores in a cache-oblivious manner. We improve on the above cache-oblivious processor-aware parallel implementation by using the Priority Work Stealing Scheduler (PWS) that we presented recently in a companion paper [12]. The PWS scheduler is both processor- and cache-oblivious (i.e., resource oblivious), and it tolerates asynchrony among the cores. Using PWS, we obtain a resource oblivious scheduling of our sorting algorithm that matches the performance of the processor-aware version. Our analysis includes the delay incurred by false-sharing. We also establish good bounds for our algorithm with the randomized work stealing scheduler.
UR - http://www.scopus.com/inward/record.url?scp=77955314108&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955314108&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-14165-2_20
DO - 10.1007/978-3-642-14165-2_20
M3 - Conference contribution
AN - SCOPUS:77955314108
SN - 3642141641
SN - 9783642141645
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 226
EP - 237
BT - Automata, Languages and Programming - 37th International Colloquium, ICALP 2010, Proceedings
T2 - 37th International Colloquium on Automata, Languages and Programming, ICALP 2010
Y2 - 6 July 2010 through 10 July 2010
ER -